RAG (Retrieval-Augmented Generation)

What is RAG?

Retrieval-Augmented Generation (RAG) is a method that combines information retrieval with text generation. Instead of relying solely on what a language model memorized during training, RAG first searches a knowledge base for relevant documents and then feeds that information to the model so it can craft a more accurate answer.

Think of it like an open-book exam. A standard LLM is taking a closed-book test -- it can only use what it already learned. RAG gives the model permission to flip through a reference book before answering, dramatically reducing the chance of mistakes.

How Does It Work?

Retrieve -- When a user asks a question, the system converts the query into a vector embedding and searches a vector database for the most relevant documents or passages.
Augment -- The retrieved text is attached to the original prompt as additional context.
Generate -- The LLM reads both the question and the retrieved context, then produces a grounded, evidence-backed response.

Why Does It Matter?

RAG helps solve two major LLM weaknesses: hallucination (making things up) and stale knowledge (training data has a cutoff date). By pulling in up-to-date, domain-specific documents, RAG keeps answers accurate and current without the cost of retraining the entire model.

Key Examples

Enterprise Q&A systems that search internal wikis before answering.
Customer support bots that reference product documentation in real time.

What is RAG?

How Does It Work?

Why Does It Matter?

Key Examples

Related terms