# Context Compression Context compression refers to techniques that reduce the volume of information in an AI [[Context Window]] while preserving its utility. When the [[Token Budget]] runs tight, compression is the alternative to dropping information entirely. The [[Context Engineering]] note outlines the main approaches: - **KV Cache management**: optimizing the key-value cache that stores attention state, so the model retains more relevant information per token - **Hierarchical memory**: tiered storage where detailed context is offloaded and only summaries remain in the active window - **Recurrent compression**: progressive summarization of older context as conversation length grows - **Selective attention**: focusing compute on the most relevant parts of the context The core tension is captured by the [[Natural tension between compression and context]]: compress too aggressively and you lose nuance; keep too much and you get [[Context Bloat]]. Good compression preserves the signal while reducing the noise. Bad compression flattens important distinctions. In practice, [[Claude Code]] and similar [[AI Agent Harness|AI agent harnesses]] handle this automatically by summarizing older conversation turns as the context window fills. But at the knowledge system level (CLAUDE.md files, skills, memories), compression is a manual discipline. Writing concise, information-dense context entries is itself a form of compression; one that benefits from the same skill as good technical writing. ## References - ## Related - [[Context Window]] - [[Token Budget]] - [[Context Engineering]] - [[Natural tension between compression and context]] - [[Context Bloat]] - [[Context Hygiene]] - [[AI Agent Harness]] - [[Claude Code]]