LiteLLM Proxy Configuration

# LiteLLM Proxy Configuration The [[LiteLLM]] proxy is a self-hosted [[AI Gateway]] shipped as a [[Docker]] image. It's configured through a single `config.yaml` file that declares available models, provider credentials, and gateway-wide settings. Runs on `http://0.0.0.0:4000` by default. ## Docker images - `docker.litellm.ai/berriai/litellm:main-latest` — core proxy - `ghcr.io/berriai/litellm-database:main-latest` — bundles [[PostgreSQL]] client support for virtual keys and spend tracking - Non-root variant available for hardened deployments ## Minimal run command ```bash docker run \ -v $(pwd)/litellm_config.yaml:/app/config.yaml \ -e AZURE_API_KEY=your-key \ -e AZURE_API_BASE=your-base \ -p 4000:4000 \ docker.litellm.ai/berriai/litellm:main-latest \ --config /app/config.yaml --detailed_debug ``` ## config.yaml structure ```yaml model_list: - model_name: gpt-4o # client-facing alias litellm_params: model: azure/my_azure_deployment # <provider>/<model-id> api_base: os.environ/AZURE_API_BASE # env var reference api_key: os.environ/AZURE_API_KEY api_version: "2025-01-01-preview" - model_name: claude-sonnet litellm_params: model: anthropic/claude-sonnet-4 api_key: os.environ/ANTHROPIC_API_KEY general_settings: master_key: sk-1234 # admin token, must start with sk- database_url: "postgresql://user:pw@host:5432/litellm" litellm_settings: drop_params: true # drop unsupported params silently num_retries: 3 request_timeout: 600 ``` ### Key sections - **`model_list`** — every model the gateway will serve. `model_name` is the public alias clients pass; `litellm_params.model` uses the `<provider>/<model-id>` routing prefix. Multiple entries can share the same `model_name` for load balancing across deployments. - **`general_settings`** — `master_key` (admin auth), `database_url` (enables virtual keys + spend tracking), SSO / JWT config. - **`litellm_settings`** — SDK-level behavior: retries, timeouts, caching, callbacks, fallbacks. - **`router_settings`** — routing strategy (`simple-shuffle`, `least-busy`, `usage-based-routing-v2`, `latency-based-routing`). ## Virtual keys With `database_url` configured, the master key can mint scoped keys via the admin API: ```bash curl -L -X POST 'http://0.0.0.0:4000/key/generate' \ -H 'Authorization: Bearer sk-1234' \ -H 'Content-Type: application/json' \ -d '{"models": ["gpt-4o"], "rpm_limit": 60, "max_budget": 25.0, "duration": "30d"}' ``` Each virtual key can restrict models, set RPM/TPM caps, enforce budgets, and tie spend to a user or team. ## Production stack A typical compose setup bundles: - `docker-compose.yml` — LiteLLM proxy + Postgres (+ optional Redis for caching) - `config.yaml` — model list and gateway settings - `.env` — master key, salt key, provider credentials - `prometheus.yml` — metrics scrape config (LiteLLM exposes `/metrics`) All files must exist before `docker compose up` or the container exits on startup. ## Client usage Any OpenAI-compatible SDK ([[Python]], Node, [[LangChain]], curl) works by pointing `base_url` at the proxy and using a virtual key as the API key: ```python from openai import OpenAI client = OpenAI(base_url="http://localhost:4000", api_key="sk-virtual-key") client.chat.completions.create(model="claude-sonnet", messages=[...]) ``` ## References - https://docs.litellm.ai/docs - https://docs.litellm.ai/docs/proxy/docker_quick_start - https://docs.litellm.ai/docs/proxy/configs ## Related - [[LiteLLM]] - [[LiteLLM Claude Code Proxy]] - [[AI Gateway]] - [[Docker]] - [[PostgreSQL]] - [[Model routing]] - [[AI Observability]]