# Context Window The context window is the maximum amount of text (measured in tokens) that a [[Large Language Models (LLMs)|Large Language Model]] can process in a single interaction. It includes everything: the system prompt, conversation history, retrieved documents, tool outputs, and the model's own response. Once the window is full, older information must be dropped, summarized, or compressed. Context windows have grown dramatically. Early models had 4K-8K token windows. By 2025, [[Claude]] offered 200K tokens. [[Claude|Claude Opus 4.6]] (2025) pushed this to 1 million tokens. Google's [[Gemini]] models reached similar scales. This expansion fundamentally changed what's possible: entire codebases, long documents, and rich conversation histories can now fit in a single context. ## Longer context does not mean infinite attention More tokens in the window doesn't automatically mean better results. Models can struggle to attend equally to all parts of a long context. Early long-context models showed a "lost in the middle" effect: information at the start and end of the context was recalled well, but information buried in the middle was often missed. This has improved significantly. Modern models like Claude Opus 4.6 demonstrate near-perfect recall across the full 1M token window on needle-in-a-haystack benchmarks. The models are getting genuinely better at finding and using information regardless of its position. But the fundamental constraint remains: [[AI context is finite with diminishing returns]], and filling the window with low-quality context still degrades output. ## Why this matters for context engineering The context window is the hard constraint that all of [[Context Engineering]] works within. Every design pattern in the field, [[Progressive Disclosure]], [[Prompt Lazy Loading AI Design Pattern (PLL)]], [[Context Compression]], [[Context Hygiene]], exists because context is a finite, precious resource. Even with million-token windows, the signal-to-noise ratio matters more than raw capacity. A lean, well-curated 50K context often outperforms a noisy 500K one. The [[Token Budget]] is how practitioners reason about this constraint in practice. ## References - ## Related - [[Large Language Models (LLMs)]] - [[Claude]] - [[Anthropic]] - [[Token Budget]] - [[Context Engineering]] - [[Context Compression]] - [[Context Hygiene]] - [[Context Bloat]] - [[AI context is finite with diminishing returns]] - [[Progressive Disclosure]] - [[Prompt Lazy Loading AI Design Pattern (PLL)]] - [[Natural tension between compression and context]]