Chunk

What is a Chunk?

A chunk is a text segment created by splitting a long document into smaller units.
In RAG systems, documents are usually indexed and retrieved at chunk level rather than as one full file.

Think of it as turning a large report into small reference cards, then pulling only the cards you need.

How does it work?

A common pipeline looks like this:

Split source documents by token length and semantic boundaries (paragraphs/sections).
Convert each chunk into embeddings and store them in a vector database.
Retrieve relevant chunks for a query and attach them to the LLM context.

In practice, chunk size and overlap are major tuning parameters.
If chunks are too small, context is fragmented. If too large, retrieval precision and cost can degrade.

Why does it matter?

Chunk design is one of the biggest drivers of RAG quality:

Retrieval precision
Grounded answer quality
Cost and latency efficiency

In short, chunking strategy can matter as much as model choice.

What is a Chunk?

How does it work?

Why does it matter?

Related terms