Skip to main content
Back to List
Natural Language Processing

Chunk

A text segment created by splitting long documents into meaningful units for retrieval and generation

#Chunk#Chunking#RAG#Retrieval

What is a Chunk?

A chunk is a text segment created by splitting a long document into smaller units.
In RAG systems, documents are usually indexed and retrieved at chunk level rather than as one full file.

Think of it as turning a large report into small reference cards, then pulling only the cards you need.

How does it work?

A common pipeline looks like this:

  1. Split source documents by token length and semantic boundaries (paragraphs/sections).
  2. Convert each chunk into embeddings and store them in a vector database.
  3. Retrieve relevant chunks for a query and attach them to the LLM context.

In practice, chunk size and overlap are major tuning parameters.
If chunks are too small, context is fragmented. If too large, retrieval precision and cost can degrade.

Why does it matter?

Chunk design is one of the biggest drivers of RAG quality:

  • Retrieval precision
  • Grounded answer quality
  • Cost and latency efficiency

In short, chunking strategy can matter as much as model choice.

Related terms