Skip to main content
Back to List
Natural Language Processing

RAG (Retrieval-Augmented Generation)

A technique that enhances LLM responses by retrieving relevant external information before generating an answer

#RAG#LLM#Vector DB

What is RAG?

Retrieval-Augmented Generation (RAG) is a method that combines information retrieval with text generation. Instead of relying solely on what a language model memorized during training, RAG first searches a knowledge base for relevant documents and then feeds that information to the model so it can craft a more accurate answer.

Think of it like an open-book exam. A standard LLM is taking a closed-book test -- it can only use what it already learned. RAG gives the model permission to flip through a reference book before answering, dramatically reducing the chance of mistakes.

How Does It Work?

  1. Retrieve -- When a user asks a question, the system converts the query into a vector embedding and searches a vector database for the most relevant documents or passages.
  2. Augment -- The retrieved text is attached to the original prompt as additional context.
  3. Generate -- The LLM reads both the question and the retrieved context, then produces a grounded, evidence-backed response.

Why Does It Matter?

RAG helps solve two major LLM weaknesses: hallucination (making things up) and stale knowledge (training data has a cutoff date). By pulling in up-to-date, domain-specific documents, RAG keeps answers accurate and current without the cost of retraining the entire model.

Key Examples

  • Enterprise Q&A systems that search internal wikis before answering.
  • Customer support bots that reference product documentation in real time.

Related terms