RAG (Retrieval-Augmented Generation)
A technique that enhances LLM responses by retrieving relevant external information before generating an answer
What is RAG?
Retrieval-Augmented Generation (RAG) is a method that combines information retrieval with text generation. Instead of relying solely on what a language model memorized during training, RAG first searches a knowledge base for relevant documents and then feeds that information to the model so it can craft a more accurate answer.
Think of it like an open-book exam. A standard LLM is taking a closed-book test -- it can only use what it already learned. RAG gives the model permission to flip through a reference book before answering, dramatically reducing the chance of mistakes.
How Does It Work?
- Retrieve -- When a user asks a question, the system converts the query into a vector embedding and searches a vector database for the most relevant documents or passages.
- Augment -- The retrieved text is attached to the original prompt as additional context.
- Generate -- The LLM reads both the question and the retrieved context, then produces a grounded, evidence-backed response.
Why Does It Matter?
RAG helps solve two major LLM weaknesses: hallucination (making things up) and stale knowledge (training data has a cutoff date). By pulling in up-to-date, domain-specific documents, RAG keeps answers accurate and current without the cost of retraining the entire model.
Key Examples
- Enterprise Q&A systems that search internal wikis before answering.
- Customer support bots that reference product documentation in real time.