# LiteLLM LiteLLM is an open-source [[AI Gateway]] that exposes 100+ LLM providers through a single OpenAI-compatible interface. It ships in two forms: a Python SDK for embedding the gateway logic inside applications, and a self-hostable proxy server (the "LLM Gateway") that centralizes access, spend tracking, and governance for teams. MIT-licensed; maintained by BerriAI. ## Two modes of use - **Python SDK** — import `litellm`, call any provider with `completion(model="anthropic/claude-sonnet-4", ...)`. Drop-in retries, fallbacks, streaming, cost calculation, and observability callbacks (Langfuse, MLflow, S3). - **Proxy server** — self-hosted gateway on port 4000 by default. Any OpenAI-compatible client (OpenAI SDK, [[LangChain]], curl) can point at it. Adds virtual keys, budgets, rate limits, an admin UI, and Postgres-backed persistence. See [[LiteLLM Proxy Configuration]]. ## Supported providers 100+ providers via `<provider>/<model>` prefix: [[OpenAI]], [[Anthropic]], Azure OpenAI, AWS Bedrock, [[Gemini]] / Vertex AI, Hugging Face, Groq, Deepseek, Mistral, Cohere, Ollama, vLLM, plus any OpenAI-compatible endpoint. The same client code targets all of them. ## Key features - **Unified OpenAI format** — requests and responses normalized across providers, including streaming (SSE), tool calls, and vision inputs - **[[Model routing]] and load balancing** — round-robin, least-busy, latency-based across multiple deployments of the same logical model - **Automatic fallbacks** — on error, rate-limit, or context-window overflow, retry on a backup model or provider - **Cost tracking** — accurate per-request cost computed across all providers, attributed to keys, users, teams, or organizations - **Rate limits and budgets** — RPM, TPM, and max-parallel-request caps; hard/soft budget enforcement per virtual key - **Virtual keys** — issue per-user or per-team `sk-...` keys with scoped models, budgets, and limits - **Observability** — Prometheus metrics, detailed request logs, callback hooks to Langfuse, MLflow, Datadog, S3 - **[[AI Guardrails]]** — pluggable content filtering and PII redaction at the gateway layer - **Caching** — in-memory or Redis caching of repeated prompts - **Enterprise add-ons** — SSO, JWT auth, audit logs, granular RBAC ## Typical use case Platform teams running LiteLLM Proxy in front of every LLM call in the company: developers get one endpoint and one key, the platform gets spend attribution, quota enforcement, and provider-agnostic fallback. Reduces vendor lock-in and puts a single control point in front of [[Large Language Models (LLMs)]] consumption. ## Comparison Unlike [[OpenRouter]] (hosted aggregator, usage markup) or [[Vercel AI Gateway]] (managed, tight Vercel integration, narrower catalog), LiteLLM is self-hosted, open-source, and optimized for enterprise governance. Trade-off: you operate the infrastructure yourself (Postgres, Redis, container orchestration). ## References - https://www.litellm.ai/ - https://docs.litellm.ai/docs - https://docs.litellm.ai/docs/proxy/docker_quick_start - Source code: https://github.com/BerriAI/litellm ## Related - [[LiteLLM Proxy Configuration]] - [[LiteLLM Claude Code Proxy]] - [[Claude Code via GitHub Copilot]] - [[AI Gateway]] - [[OpenRouter]] - [[Vercel AI Gateway]] - [[Model routing]] - [[AI Observability]] - [[AI Guardrails]] - [[Bring Your Own Key (BYOK)]] - [[Large Language Models (LLMs)]] - [[Anthropic]] - [[OpenAI]] - [[Python]] - [[Docker]]