Code Mode MCP Pattern - DeveloPassion

# Code Mode MCP Pattern Code Mode is a pattern for using [[Model Context Protocol (MCP)|MCP]] tools where, instead of exposing each tool directly to the model as a callable function, you give the model a typed API ([[TypeScript]] or Python) and ask it to **write code** that calls those tools. The code runs in a sandbox, and only the final result comes back to the model. The term was popularized by [[Cloudflare]] in 2025, and the same idea now ships in libraries like [[FastMCP]]. The motivation is simple. Models have seen enormous amounts of real code during training, and very little synthetic tool-calling. So they're much better at writing a few lines of code than at chaining a dozen special-token tool calls by hand. As Cloudflare put it: LLMs are better at writing code to call MCP than at calling MCP directly. ## The problem it solves Classic MCP wiring has two costs that grow fast: - **Every tool definition lives in the [[Context Window]].** Connect a few servers with a few hundred tools between them, and you can burn tens of thousands of tokens before the model has even read the user's request. - **Every intermediate result round-trips through the model.** Tool A's output gets copied into the model's input just to become the argument for tool B. The model pays, in tokens and in latency, to shuffle data it never actually needed to reason about. Keep that second point in mind. It's the one most people miss. ## How it works 1. **Schema to API.** On connection, the MCP server's schema gets converted into typed interface definitions with doc comments. The model reads an API, not a flat list of tools. 2. **The model writes code.** It produces a small script that calls that API. It can chain several operations, filter, loop, branch, whatever the task needs, all in one shot. 3. **Sandbox execution.** The code runs in an isolated sandbox. Cloudflare uses V8 isolates that spin up in a few milliseconds; FastMCP runs Python with time and memory limits (30s and 100MB by default). Network access is locked down so credentials can't leak. 4. **Only the result returns.** Intermediate values stay in the sandbox. The model gets the answer, not the plumbing. Some implementations go a step further with **progressive discovery**: rather than dumping every tool upfront, they expose a couple of meta-tools (typically `search` and `execute`) so the model looks up only what it needs, when it needs it. ## Why it matters: context, efficiency, scalability This is where the pattern earns its keep. - **Context window.** You stop paying the upfront tax of loading every schema. Cloudflare's headline example: for a large surface like the full Cloudflare API, Code Mode cut input tokens by 99.9%. The traditional equivalent would have needed around 1.17 million tokens, which is more than most context windows can even hold. A bloated context isn't just expensive; it actively confuses the model and degrades its answers. - **Efficiency.** Because intermediate results never flow back through the model, you cut both token spend and round-trips. A task that used to be five separate tool calls (five trips through the neural network) becomes one script and one result. - **Scalability.** With the `search` + `execute` approach, context usage stays roughly constant no matter how big the underlying API grows. You can even compose multiple MCP servers behind a single gateway (Cloudflare calls these "Code Mode Server Portals"), so adding the tenth or the hundredth server doesn't flood the agent. The thing that used to break first under growth, the context window, stops being the bottleneck. ## My Opinion I think this pattern genuinely matters, and it maps directly to a problem I keep hitting myself: the more agentic resources you add (AI Skills, MCP servers, agents), the faster they fill the context window and confuse the model. See [[Agentic Resource Discovery Specification (ARD)]] for the discovery side of the same pain. Code Mode attacks it from the MCP side. ## References - https://blog.cloudflare.com/code-mode/ - https://blog.cloudflare.com/code-mode-mcp/ - https://gofastmcp.com/servers/transforms/code-mode - https://medium.com/@amirkiarafiei/mcp-code-mode-context-engineering-for-efficient-tool-execution-in-llm-agents-c46e1ddf80ac - https://dev.to/aws-heroes/code-mode-for-mcp-the-long-tail-escape-hatch-not-the-front-door-40ga - https://news.ycombinator.com/item?id=45399204 ## Related - [[Model Context Protocol (MCP)]] - [[AI Agents]] - [[Agentic Engineering]] - [[Agentic Resource Discovery Specification (ARD)]] - [[Large Language Models (LLMs)]] - [[Context Window]] - [[Cloudflare]] - [[Claude Code]] - [[FastMCP]]