Context Window
The maximum number of tokens a model can process in a single request
What is a Context Window?
A context window is the maximum amount of text (in tokens) an AI model can read and reason over in one request.
You can think of it as the model's short-term working memory. If the total input exceeds this limit, earlier or lower-priority information may be dropped.
How does it work?
The model counts everything in scope as tokens: system instructions, user prompts, chat history, and attached content. When that total approaches the limit, you may see:
- Loss of earlier conversation details
- Partial understanding of long documents
- Inconsistent adherence to instructions
For long workflows, teams usually apply chunking, summarization, and periodic context refresh.
Why does it matter?
Context window size directly affects answer quality, latency, and cost in real-world use. It is especially important for document analysis, coding sessions, and multi-step agent workflows.