# Docker Model Runner
Docker Model Runner is a feature of [[Docker Desktop]] that lets you download and run AI/ML models locally using Docker CLI commands. It provides a simple interface to pull models from registries (including Docker Hub and Hugging Face), run inference, and expose an OpenAI-compatible API endpoint for integration with existing tools and applications.
Models run natively on the host machine (using GPU acceleration when available), not inside containers. This means near-bare-metal performance for inference. The familiar Docker UX (`docker model pull`, `docker model run`, `docker model ls`) makes it accessible to developers already comfortable with Docker workflows.
## Key Commands
```bash
# List available models
docker model ls
# Pull a model
docker model pull ai/llama3.2
# Run a model interactively
docker model run ai/llama3.2
# Serve a model via OpenAI-compatible API
docker model serve ai/llama3.2
```
## OpenAI-Compatible API
When a model is served, it exposes an API at `http://localhost` (via Docker Desktop's built-in proxy) that follows the OpenAI chat completions format. Any tool or library that supports the OpenAI API can point to this endpoint, making it a drop-in local replacement for cloud-based LLM APIs.
## Use Cases
- Local AI development and testing without cloud API costs
- Privacy-sensitive workloads where data cannot leave the machine
- Offline development environments
- Prototyping AI-powered features before committing to a cloud provider
- Running [[Model Context Protocol (MCP)]] tool servers alongside local models
## References
- https://docs.docker.com/model-runner/
- https://www.docker.com/blog/docker-model-runner/
## Related
- [[Docker AI]]
- [[Docker Desktop]]
- [[Docker]]
- [[Docker MCP Catalog]]
- [[Docker Compose for AI]]
- [[Ollama]]
- [[Model Context Protocol (MCP)]]