# Context Compression
Context compression refers to techniques that reduce the volume of information in an AI [[Context Window]] while preserving its utility. When the [[Token Budget]] runs tight, compression is the alternative to dropping information entirely.
The [[Context Engineering]] note outlines the main approaches:
- **KV Cache management**: optimizing the key-value cache that stores attention state, so the model retains more relevant information per token
- **Hierarchical memory**: tiered storage where detailed context is offloaded and only summaries remain in the active window
- **Recurrent compression**: progressive summarization of older context as conversation length grows
- **Selective attention**: focusing compute on the most relevant parts of the context
The core tension is captured by the [[Natural tension between compression and context]]: compress too aggressively and you lose nuance; keep too much and you get [[Context Bloat]]. Good compression preserves the signal while reducing the noise. Bad compression flattens important distinctions.
In practice, [[Claude Code]] and similar [[AI Agent Harness|AI agent harnesses]] handle this automatically by summarizing older conversation turns as the context window fills. But at the knowledge system level (CLAUDE.md files, skills, memories), compression is a manual discipline. Writing concise, information-dense context entries is itself a form of compression; one that benefits from the same skill as good technical writing.
## References
-
## Related
- [[Context Window]]
- [[Token Budget]]
- [[Context Engineering]]
- [[Natural tension between compression and context]]
- [[Context Bloat]]
- [[Context Hygiene]]
- [[AI Agent Harness]]
- [[Claude Code]]