# Model routing Model routing is the practice of directing different tasks to different AI models based on the task's requirements. Instead of using one model for everything, a routing layer selects the most appropriate model considering capability, cost, speed, and context needs. This is the [[Receptionist AI Design Pattern]] applied at the model level. A simple classification (or even a smaller, faster model) decides whether a task needs a powerful reasoning model like [[Claude|Claude Opus]] or can be handled by a faster, cheaper model like Haiku. [[AI Subagents]] naturally implement this: the parent agent uses a premium model while subagents can run on lighter models for routine tasks like code review or file exploration. Routing criteria: - **Task complexity**: simple lookups vs. multi-step reasoning - **Latency requirements**: real-time responses vs. background processing - **Cost sensitivity**: high-volume tasks benefit from cheaper models - **Context length**: some tasks need large [[Context Window|context windows]], others don't - **Specialization**: some models excel at code, others at creative writing or analysis [[OpenRouter]] and [[AI Gateway]] solutions provide infrastructure for model routing, offering a unified API across multiple providers with automatic fallback, load balancing, and cost optimization. The trade-off is routing accuracy. Misrouting a complex task to a cheap model produces bad output. Misrouting a simple task to an expensive model wastes money. Getting this right requires [[AI Observability]] to track quality per model per task type. ## References - ## Related - [[Receptionist AI Design Pattern]] - [[AI Subagents]] - [[AI Agent Orchestration]] - [[AI Gateway]] - [[OpenRouter]] - [[Large Language Models (LLMs)]] - [[Context Window]] - [[AI Observability]]