# LiteLLM
LiteLLM is an open-source [[AI Gateway]] that exposes 100+ LLM providers through a single OpenAI-compatible interface. It ships in two forms: a Python SDK for embedding the gateway logic inside applications, and a self-hostable proxy server (the "LLM Gateway") that centralizes access, spend tracking, and governance for teams. MIT-licensed; maintained by BerriAI.
## Two modes of use
- **Python SDK** — import `litellm`, call any provider with `completion(model="anthropic/claude-sonnet-4", ...)`. Drop-in retries, fallbacks, streaming, cost calculation, and observability callbacks (Langfuse, MLflow, S3).
- **Proxy server** — self-hosted gateway on port 4000 by default. Any OpenAI-compatible client (OpenAI SDK, [[LangChain]], curl) can point at it. Adds virtual keys, budgets, rate limits, an admin UI, and Postgres-backed persistence. See [[LiteLLM Proxy Configuration]].
## Supported providers
100+ providers via `<provider>/<model>` prefix: [[OpenAI]], [[Anthropic]], Azure OpenAI, AWS Bedrock, [[Gemini]] / Vertex AI, Hugging Face, Groq, Deepseek, Mistral, Cohere, Ollama, vLLM, plus any OpenAI-compatible endpoint. The same client code targets all of them.
## Key features
- **Unified OpenAI format** — requests and responses normalized across providers, including streaming (SSE), tool calls, and vision inputs
- **[[Model routing]] and load balancing** — round-robin, least-busy, latency-based across multiple deployments of the same logical model
- **Automatic fallbacks** — on error, rate-limit, or context-window overflow, retry on a backup model or provider
- **Cost tracking** — accurate per-request cost computed across all providers, attributed to keys, users, teams, or organizations
- **Rate limits and budgets** — RPM, TPM, and max-parallel-request caps; hard/soft budget enforcement per virtual key
- **Virtual keys** — issue per-user or per-team `sk-...` keys with scoped models, budgets, and limits
- **Observability** — Prometheus metrics, detailed request logs, callback hooks to Langfuse, MLflow, Datadog, S3
- **[[AI Guardrails]]** — pluggable content filtering and PII redaction at the gateway layer
- **Caching** — in-memory or Redis caching of repeated prompts
- **Enterprise add-ons** — SSO, JWT auth, audit logs, granular RBAC
## Typical use case
Platform teams running LiteLLM Proxy in front of every LLM call in the company: developers get one endpoint and one key, the platform gets spend attribution, quota enforcement, and provider-agnostic fallback. Reduces vendor lock-in and puts a single control point in front of [[Large Language Models (LLMs)]] consumption.
## Comparison
Unlike [[OpenRouter]] (hosted aggregator, usage markup) or [[Vercel AI Gateway]] (managed, tight Vercel integration, narrower catalog), LiteLLM is self-hosted, open-source, and optimized for enterprise governance. Trade-off: you operate the infrastructure yourself (Postgres, Redis, container orchestration).
## References
- https://www.litellm.ai/
- https://docs.litellm.ai/docs
- https://docs.litellm.ai/docs/proxy/docker_quick_start
- Source code: https://github.com/BerriAI/litellm
## Related
- [[LiteLLM Proxy Configuration]]
- [[LiteLLM Claude Code Proxy]]
- [[Claude Code via GitHub Copilot]]
- [[AI Gateway]]
- [[OpenRouter]]
- [[Vercel AI Gateway]]
- [[Model routing]]
- [[AI Observability]]
- [[AI Guardrails]]
- [[Bring Your Own Key (BYOK)]]
- [[Large Language Models (LLMs)]]
- [[Anthropic]]
- [[OpenAI]]
- [[Python]]
- [[Docker]]