# Edgee
Edgee is an **agent gateway** — a transparent proxy that sits between AI coding agents (Claude Code, Codex, Cursor, OpenCode) and LLM providers (Anthropic, OpenAI, GLM, others). Its pitch is "Compress, route, observe": cut token costs up to 50%, extend coding sessions ~30%, all without changing your agent's code.
## The Problem It Solves
AI coding agents burn tokens fast — repeated tool results, verbose context, long-running sessions, no fallback when a provider hiccups, and minimal visibility for teams paying the bill. Edgee inserts itself between agent and API to fix all four at once.
## Core Operations
| Operation | What it does |
|---|---|
| **Compress** | Trims tool results and applies brevity optimization on input/output payloads. Claims 60–90% reduction while preserving semantic integrity for coding tasks. |
| **Route** | Provider fallback. If a request fails, automatically retries against an alternative provider transparently — the agent never sees the outage. |
| **Observe** | Per-request latency, errors, usage, and cost telemetry — sliced by model, application, environment, repository, PR, and team seat. |
## Key Features
- **Bring Your Own Keys (BYOK)** — use your own Anthropic / OpenAI credentials; Edgee doesn't gatekeep billing.
- **Multi-provider** — works with any OpenAI- or Anthropic-compatible client.
- **Team management** — seat tracking, cost-per-repo, cost-per-PR.
- **Multi-language SDKs** — TypeScript, Python, Go, Rust + drop-in compatibility with the official OpenAI SDK, Anthropic SDK, LangChain, and raw cURL.
- **CLI launcher** — `edgee` wraps and launches Claude Code / Codex / OpenCode through the proxy with one command.
## Why This Matters
Coding agents are the single highest-token-cost workload most developers run. A 50% cost cut on an Opus-class workload is the difference between sustainable solo usage and capped subscriptions. The team-level observability also turns "AI tools" from an unpredictable expense into a measurable per-PR cost.
The trade-off worth checking: every proxy adds a hop, a trust boundary, and a single point of failure. Compression that's "lossy for coding tasks" needs validation on your codebase, not theirs.
## Comparable Tools / Adjacent Space
- Direct API SDKs with manual prompt caching — cheaper but no observability layer.
- [[MLflow]] — LLM observability and gateway features for production apps (different audience: ML/agent ops, not coding-agent users).
- LiteLLM, OpenRouter — multi-provider routing without the compression / coding-agent focus.
- Anthropic's native prompt caching — orthogonal; Edgee likely stacks on top.
## Installation & Setup
```bash
# macOS (Homebrew)
brew install edgee
# Linux / Windows — see docs
# Launch a coding agent through the proxy
edgee claude
edgee codex
edgee opencode
```
## References
- [Edgee Website](https://www.edgee.ai/)
- [Edgee Documentation](https://www.edgee.ai/docs)
- [Edgee LLMs.txt Index](https://www.edgee.ai/docs/llms.txt)
- [Edgee Pricing](https://www.edgee.ai/pricing)
- [Edgee GitHub Org (edgee-cloud)](https://github.com/edgee-cloud)
- [Edgee Blog](https://www.edgee.ai/blog)
## Related
- [[MLflow]] — adjacent LLM observability / gateway for production ML and agents
- [[Claude Code]] — primary target client for Edgee
- [[Anthropic]] — LLM provider Edgee proxies
- [[OpenAI]] — LLM provider Edgee proxies
- [[LLM Monitoring]] — broader observability theme
- [[AI Observability]] — broader observability theme
- [[LangSmith]] — hosted LLM observability and eval platform
- [[Langfuse]] — open-source self-hostable LLM observability
- [[Helicone]] — open-source LLM observability via gateway proxy
- LLM cost optimization — broader theme Edgee addresses (no dedicated note yet)
- Prompt caching — complementary technique stacked beneath gateway compression (no dedicated note yet)