# Headroom **Headroom is a compression layer that shrinks the context you send to an LLM by 60–95% before inference, without touching answer quality.** Tool outputs, logs, RAG results, files, conversation history; all squeezed on the way in. Apache 2.0. Why it pays off: agents waste tokens chewing on verbose tool outputs and logs, and output tokens cost up to 5x input on Opus-class models. Headroom cuts the redundancy with no code changes required. This is [[Context Engineering]] applied as infrastructure rather than prompt craft. ## What's inside - **ContentRouter.** Detects content type and picks the right compressor. - **SmartCrusher.** Universal JSON compression. - **CodeCompressor.** AST-aware compression for Python, JavaScript, Go, Rust, Java, C++. - **Kompress-base.** A HuggingFace model trained on agentic traces for text compression. - **CCR (reversible compression).** Caches the originals locally; the LLM pulls the full version back via a tool only when it actually needs it. - **Output reduction.** Verbosity steering and effort routing trim the model's own output, not just the input. - **Cross-agent memory.** A shared, deduplicated store across Claude, Codex, and Gemini. - **CacheAligner.** Stabilizes prefixes so provider KV caches keep hitting. ## Three ways to run it 1. **Library** — call `compress(messages)` directly in Python or TypeScript. 2. **Proxy** — a drop-in OpenAI-compatible proxy; zero code changes. 3. **MCP server** — `headroom_compress`, `headroom_retrieve`, `headroom_stats`. Flow: CacheAligner → ContentRouter → specialized compressors → CCR storage. ## Usage ```bash pip install "headroom-ai[all]" headroom wrap claude # wrap a coding agent headroom proxy --port 8787 # standalone proxy ``` Python 3.10+, with Rust kernels for the hot paths and a TypeScript SDK. Optional GPU via PyTorch MPS. Pulls ONNX Runtime and the Kompress-base model at runtime. ## References - https://github.com/chopratejas/headroom - https://headroom-docs.vercel.app/docs ## Related - [[Context Engineering]] - [[Agentic Context Engineering]] - [[AI Tokenization]] - [[Model Context Protocol (MCP)]] - [[AI Retrieval Patterns]] - [[CocoIndexCode]]