# Edgee Edgee is an **agent gateway** — a transparent proxy that sits between AI coding agents (Claude Code, Codex, Cursor, OpenCode) and LLM providers (Anthropic, OpenAI, GLM, others). Its pitch is "Compress, route, observe": cut token costs up to 50%, extend coding sessions ~30%, all without changing your agent's code. ## The Problem It Solves AI coding agents burn tokens fast — repeated tool results, verbose context, long-running sessions, no fallback when a provider hiccups, and minimal visibility for teams paying the bill. Edgee inserts itself between agent and API to fix all four at once. ## Core Operations | Operation | What it does | |---|---| | **Compress** | Trims tool results and applies brevity optimization on input/output payloads. Claims 60–90% reduction while preserving semantic integrity for coding tasks. | | **Route** | Provider fallback. If a request fails, automatically retries against an alternative provider transparently — the agent never sees the outage. | | **Observe** | Per-request latency, errors, usage, and cost telemetry — sliced by model, application, environment, repository, PR, and team seat. | ## Key Features - **Bring Your Own Keys (BYOK)** — use your own Anthropic / OpenAI credentials; Edgee doesn't gatekeep billing. - **Multi-provider** — works with any OpenAI- or Anthropic-compatible client. - **Team management** — seat tracking, cost-per-repo, cost-per-PR. - **Multi-language SDKs** — TypeScript, Python, Go, Rust + drop-in compatibility with the official OpenAI SDK, Anthropic SDK, LangChain, and raw cURL. - **CLI launcher** — `edgee` wraps and launches Claude Code / Codex / OpenCode through the proxy with one command. ## Why This Matters Coding agents are the single highest-token-cost workload most developers run. A 50% cost cut on an Opus-class workload is the difference between sustainable solo usage and capped subscriptions. The team-level observability also turns "AI tools" from an unpredictable expense into a measurable per-PR cost. The trade-off worth checking: every proxy adds a hop, a trust boundary, and a single point of failure. Compression that's "lossy for coding tasks" needs validation on your codebase, not theirs. ## Comparable Tools / Adjacent Space - Direct API SDKs with manual prompt caching — cheaper but no observability layer. - [[MLflow]] — LLM observability and gateway features for production apps (different audience: ML/agent ops, not coding-agent users). - LiteLLM, OpenRouter — multi-provider routing without the compression / coding-agent focus. - Anthropic's native prompt caching — orthogonal; Edgee likely stacks on top. ## Installation & Setup ```bash # macOS (Homebrew) brew install edgee # Linux / Windows — see docs # Launch a coding agent through the proxy edgee claude edgee codex edgee opencode ``` ## References - [Edgee Website](https://www.edgee.ai/) - [Edgee Documentation](https://www.edgee.ai/docs) - [Edgee LLMs.txt Index](https://www.edgee.ai/docs/llms.txt) - [Edgee Pricing](https://www.edgee.ai/pricing) - [Edgee GitHub Org (edgee-cloud)](https://github.com/edgee-cloud) - [Edgee Blog](https://www.edgee.ai/blog) ## Related - [[MLflow]] — adjacent LLM observability / gateway for production ML and agents - [[Claude Code]] — primary target client for Edgee - [[Anthropic]] — LLM provider Edgee proxies - [[OpenAI]] — LLM provider Edgee proxies - [[LLM Monitoring]] — broader observability theme - [[AI Observability]] — broader observability theme - [[LangSmith]] — hosted LLM observability and eval platform - [[Langfuse]] — open-source self-hostable LLM observability - [[Helicone]] — open-source LLM observability via gateway proxy - LLM cost optimization — broader theme Edgee addresses (no dedicated note yet) - Prompt caching — complementary technique stacked beneath gateway compression (no dedicated note yet)