# Headroom
**Headroom is a compression layer that shrinks the context you send to an LLM by 60–95% before inference, without touching answer quality.** Tool outputs, logs, RAG results, files, conversation history; all squeezed on the way in. Apache 2.0.
Why it pays off: agents waste tokens chewing on verbose tool outputs and logs, and output tokens cost up to 5x input on Opus-class models. Headroom cuts the redundancy with no code changes required. This is [[Context Engineering]] applied as infrastructure rather than prompt craft.
## What's inside
- **ContentRouter.** Detects content type and picks the right compressor.
- **SmartCrusher.** Universal JSON compression.
- **CodeCompressor.** AST-aware compression for Python, JavaScript, Go, Rust, Java, C++.
- **Kompress-base.** A HuggingFace model trained on agentic traces for text compression.
- **CCR (reversible compression).** Caches the originals locally; the LLM pulls the full version back via a tool only when it actually needs it.
- **Output reduction.** Verbosity steering and effort routing trim the model's own output, not just the input.
- **Cross-agent memory.** A shared, deduplicated store across Claude, Codex, and Gemini.
- **CacheAligner.** Stabilizes prefixes so provider KV caches keep hitting.
## Three ways to run it
1. **Library** — call `compress(messages)` directly in Python or TypeScript.
2. **Proxy** — a drop-in OpenAI-compatible proxy; zero code changes.
3. **MCP server** — `headroom_compress`, `headroom_retrieve`, `headroom_stats`.
Flow: CacheAligner → ContentRouter → specialized compressors → CCR storage.
## Usage
```bash
pip install "headroom-ai[all]"
headroom wrap claude # wrap a coding agent
headroom proxy --port 8787 # standalone proxy
```
Python 3.10+, with Rust kernels for the hot paths and a TypeScript SDK. Optional GPU via PyTorch MPS. Pulls ONNX Runtime and the Kompress-base model at runtime.
## References
- https://github.com/chopratejas/headroom
- https://headroom-docs.vercel.app/docs
## Related
- [[Context Engineering]]
- [[Agentic Context Engineering]]
- [[AI Tokenization]]
- [[Model Context Protocol (MCP)]]
- [[AI Retrieval Patterns]]
- [[CocoIndexCode]]