# Docker Model Runner Docker Model Runner is a feature of [[Docker Desktop]] that lets you download and run AI/ML models locally using Docker CLI commands. It provides a simple interface to pull models from registries (including Docker Hub and Hugging Face), run inference, and expose an OpenAI-compatible API endpoint for integration with existing tools and applications. Models run natively on the host machine (using GPU acceleration when available), not inside containers. This means near-bare-metal performance for inference. The familiar Docker UX (`docker model pull`, `docker model run`, `docker model ls`) makes it accessible to developers already comfortable with Docker workflows. ## Key Commands ```bash # List available models docker model ls # Pull a model docker model pull ai/llama3.2 # Run a model interactively docker model run ai/llama3.2 # Serve a model via OpenAI-compatible API docker model serve ai/llama3.2 ``` ## OpenAI-Compatible API When a model is served, it exposes an API at `http://localhost` (via Docker Desktop's built-in proxy) that follows the OpenAI chat completions format. Any tool or library that supports the OpenAI API can point to this endpoint, making it a drop-in local replacement for cloud-based LLM APIs. ## Use Cases - Local AI development and testing without cloud API costs - Privacy-sensitive workloads where data cannot leave the machine - Offline development environments - Prototyping AI-powered features before committing to a cloud provider - Running [[Model Context Protocol (MCP)]] tool servers alongside local models ## References - https://docs.docker.com/model-runner/ - https://www.docker.com/blog/docker-model-runner/ ## Related - [[Docker AI]] - [[Docker Desktop]] - [[Docker]] - [[Docker MCP Catalog]] - [[Docker Compose for AI]] - [[Ollama]] - [[Model Context Protocol (MCP)]]