# LLM Tool Calling
Also called function calling. The pattern where a [[Large Language Models (LLMs)|LLM]] is given a list of available functions/tools (with names, descriptions, and parameter schemas), and the model decides when to invoke them and what arguments to pass. A *host* (an [[AI Agent Harness]], an SDK, or the model provider's own platform) executes the tool and feeds the result back, letting the LLM continue reasoning. The model itself never executes anything; it only emits structured intent.
## How It Works
1. Host registers tools with the LLM session: name, description, JSON Schema for parameters
2. User sends a prompt
3. LLM either responds with text OR with a structured tool call: `{ "tool": "search_web", "args": { "query": "..." } }`
4. Host executes the tool, captures the result
5. Result is sent back to the LLM as a new message
6. LLM uses the result to produce its final response
This loop can repeat (multi-step reasoning across multiple tool calls).
## Who Executes the Tool
The model emits the tool call. Something else runs it. *That something* varies, and the difference matters for security, observability, and portability.
### 1. The AI agent harness (most common today)
A harness like [[Claude Code]], [[OpenCode]], [[Cursor.com]], or [[Aider]] sits between the user and the model. It receives the structured tool call, executes it locally (file system, shell, browser, [[Model Context Protocol (MCP)]] servers), and sends the result back. The harness owns auth, the execution sandbox, the audit log, and any human-in-the-loop confirmation. See [[AI Agent Harness]] for the full pattern.
### 2. The model provider's API or platform
Increasingly, the API/platform around the LLM executes selected tools server-side without the host ever seeing the raw call. Examples:
- OpenAI's hosted tools (web search, code interpreter, file search) run inside OpenAI's infrastructure; you enable them with a flag.
- Anthropic's server-side tools (web search, code execution) run in Anthropic's environment.
- Anthropic's [[Claude Managed Agents]] execute the entire harness, including arbitrary tool calls, in Anthropic-managed containers.
- "Configurable integrations" on hosted assistant platforms (ChatGPT connectors, Claude integrations, Gemini extensions) let users wire in third-party services that the platform invokes on the model's behalf.
This shifts execution responsibility from the host application to the model provider; the host only configures, the platform runs.
### Why the distinction matters
- **Security boundary**: harness-executed tools touch your machine; platform-executed tools touch the provider's infrastructure (and whatever they connect out to).
- **Latency and cost**: platform-executed tools avoid round-trip to your host but charge per invocation in the API bill.
- **Portability**: code that depends on platform-executed tools breaks when you switch providers; harness-executed tools (especially over [[Model Context Protocol (MCP)]]) port more cleanly.
- **Observability**: you see harness-executed calls in your own logs; you only see platform-executed calls in whatever telemetry the provider exposes.
- **Trust**: you own the harness execution; you must trust the provider for platform execution.
A real production system usually mixes both: the harness handles local actions and proprietary integrations, the platform handles general-purpose tools (search, code interpreter) where its native implementation outperforms what you'd build.
## Why It Matters
Tool calling turns an LLM from a text generator into a controller. It's the foundation of:
- AI agents that take actions in the world
- Retrieval-augmented chatbots
- LLMs that integrate with databases, APIs, file systems, browsers
- The [[Model Context Protocol (MCP)]] and [[Claude Code]]'s tool use
## Where It Shows Up
| API | Tool Calling Support |
|---|---|
| OpenAI API | Native (`tools` parameter) |
| Anthropic Claude API | Native (`tools` parameter) |
| Google Gemini API | Native |
| W3C [[Prompt API]] | Supported via `tools` config |
| [[Gemini Nano]] | Supported on-device |
| Local LLMs (llama.cpp, Ollama) | Varies by model fine-tuning |
## Design Considerations
- **Schema clarity**: tool descriptions must be unambiguous; the LLM uses them to choose
- **Error handling**: tools can fail; the LLM needs the error message to recover or retry
- **Cost**: each tool call round-trip consumes tokens; chains can be expensive
- **Safety**: tools that mutate state need authorization and logging
## Relationship to Structured Outputs
Tool calling is a constrained form of [[LLM Structured Outputs]] — the model output must conform to one of the registered tool schemas, plus the choice of which tool. Many runtimes implement both via the same constrained-decoding mechanism.
## References
- https://www.ibm.com/think/topics/tool-calling
- https://github.com/webmachinelearning/prompt-api
## Related
- [[Large Language Models (LLMs)]]
- [[LLM Structured Outputs]]
- [[LLM Streaming]]
- [[Prompt API]]
- [[Gemini Nano]]
- [[AI Agents]]
- [[AI Agent Harness]]
- [[Claude Code]]
- [[Claude Managed Agents]]
- [[Model Context Protocol (MCP)]]
- [[Browser-Provided Language Models]]
- [[Constrained Decoding]]
- [[Agentic Engineering]]