# LLM Tool Calling Also called function calling. The pattern where a [[Large Language Models (LLMs)|LLM]] is given a list of available functions/tools (with names, descriptions, and parameter schemas), and the model decides when to invoke them and what arguments to pass. A *host* (an [[AI Agent Harness]], an SDK, or the model provider's own platform) executes the tool and feeds the result back, letting the LLM continue reasoning. The model itself never executes anything; it only emits structured intent. ## How It Works 1. Host registers tools with the LLM session: name, description, JSON Schema for parameters 2. User sends a prompt 3. LLM either responds with text OR with a structured tool call: `{ "tool": "search_web", "args": { "query": "..." } }` 4. Host executes the tool, captures the result 5. Result is sent back to the LLM as a new message 6. LLM uses the result to produce its final response This loop can repeat (multi-step reasoning across multiple tool calls). ## Who Executes the Tool The model emits the tool call. Something else runs it. *That something* varies, and the difference matters for security, observability, and portability. ### 1. The AI agent harness (most common today) A harness like [[Claude Code]], [[OpenCode]], [[Cursor.com]], or [[Aider]] sits between the user and the model. It receives the structured tool call, executes it locally (file system, shell, browser, [[Model Context Protocol (MCP)]] servers), and sends the result back. The harness owns auth, the execution sandbox, the audit log, and any human-in-the-loop confirmation. See [[AI Agent Harness]] for the full pattern. ### 2. The model provider's API or platform Increasingly, the API/platform around the LLM executes selected tools server-side without the host ever seeing the raw call. Examples: - OpenAI's hosted tools (web search, code interpreter, file search) run inside OpenAI's infrastructure; you enable them with a flag. - Anthropic's server-side tools (web search, code execution) run in Anthropic's environment. - Anthropic's [[Claude Managed Agents]] execute the entire harness, including arbitrary tool calls, in Anthropic-managed containers. - "Configurable integrations" on hosted assistant platforms (ChatGPT connectors, Claude integrations, Gemini extensions) let users wire in third-party services that the platform invokes on the model's behalf. This shifts execution responsibility from the host application to the model provider; the host only configures, the platform runs. ### Why the distinction matters - **Security boundary**: harness-executed tools touch your machine; platform-executed tools touch the provider's infrastructure (and whatever they connect out to). - **Latency and cost**: platform-executed tools avoid round-trip to your host but charge per invocation in the API bill. - **Portability**: code that depends on platform-executed tools breaks when you switch providers; harness-executed tools (especially over [[Model Context Protocol (MCP)]]) port more cleanly. - **Observability**: you see harness-executed calls in your own logs; you only see platform-executed calls in whatever telemetry the provider exposes. - **Trust**: you own the harness execution; you must trust the provider for platform execution. A real production system usually mixes both: the harness handles local actions and proprietary integrations, the platform handles general-purpose tools (search, code interpreter) where its native implementation outperforms what you'd build. ## Why It Matters Tool calling turns an LLM from a text generator into a controller. It's the foundation of: - AI agents that take actions in the world - Retrieval-augmented chatbots - LLMs that integrate with databases, APIs, file systems, browsers - The [[Model Context Protocol (MCP)]] and [[Claude Code]]'s tool use ## Where It Shows Up | API | Tool Calling Support | |---|---| | OpenAI API | Native (`tools` parameter) | | Anthropic Claude API | Native (`tools` parameter) | | Google Gemini API | Native | | W3C [[Prompt API]] | Supported via `tools` config | | [[Gemini Nano]] | Supported on-device | | Local LLMs (llama.cpp, Ollama) | Varies by model fine-tuning | ## Design Considerations - **Schema clarity**: tool descriptions must be unambiguous; the LLM uses them to choose - **Error handling**: tools can fail; the LLM needs the error message to recover or retry - **Cost**: each tool call round-trip consumes tokens; chains can be expensive - **Safety**: tools that mutate state need authorization and logging ## Relationship to Structured Outputs Tool calling is a constrained form of [[LLM Structured Outputs]] — the model output must conform to one of the registered tool schemas, plus the choice of which tool. Many runtimes implement both via the same constrained-decoding mechanism. ## References - https://www.ibm.com/think/topics/tool-calling - https://github.com/webmachinelearning/prompt-api ## Related - [[Large Language Models (LLMs)]] - [[LLM Structured Outputs]] - [[LLM Streaming]] - [[Prompt API]] - [[Gemini Nano]] - [[AI Agents]] - [[AI Agent Harness]] - [[Claude Code]] - [[Claude Managed Agents]] - [[Model Context Protocol (MCP)]] - [[Browser-Provided Language Models]] - [[Constrained Decoding]] - [[Agentic Engineering]]