Tool Calling: How AI Agents Use Tools
Tool Calling (also Function Calling) is the core capability that lets an AI Agent move beyond pure text generation: the LLM receives machine-readable descriptions of tools (APIs, data sources) and, when needed, produces a structured call with parameters that the application executes. The result flows back into the model, which then plans the next step. This turns the LLM from a text generator into an acting agent that perceives, decides and triggers actions in real systems.
Key Takeaways
- ✓Tool Calling is the bridge between the LLM and the outside world: the model returns a structured call (function name plus parameters in a JSON schema), the application executes it and feeds the result back into the reasoning loop (Perceive-Reason-Act-Observe) as an observation.
- ✓The LLM does not call tools itself - it proposes the call. Execution, validation and permissions sit with the surrounding code. That is exactly where tool permissions, input validation and human-in-the-loop for irreversible actions belong.
- ✓The most common source of error is not the models but unclear or overlapping tool definitions. Precise names, unambiguous descriptions and a when-not-to-use clause have a stronger effect than any prompt trick.
- ✓3-7 actively loaded tools are 2026 best practice; from around 10 tools onward, measurable degradation in tool choice begins. Dynamic tool discovery instead of static full loading keeps the selection sharp.
- ✓The Model Context Protocol (MCP) is the open standard for tool integration: an agent can access tools and data sources in a standardized way via MCP servers, without writing custom glue code for every integration.
- ✓Security is not an add-on: least-privilege scopes per tool, allowlists, confirmation steps for writing or irreversible actions and observability across all tool calls are mandatory as soon as an agent touches real systems.
What is Tool Calling?
Tool Calling (also Function Calling or Tool-Use) is the core capability that lets an AI Agent move beyond pure text generation. The LLM receives machine-readable descriptions of tools – functions, APIs, data sources – and, when needed, produces a structured call with parameters that the surrounding application executes. The result flows back into the model as an observation, which then plans the next step. This turns the LLM from a text generator into an acting agent.
The three core answers up front:
- What happens technically? The model returns not running prose but a structured call: a function name plus arguments in JSON format. The application executes this call and feeds the result back into the model.
- Who actually does the calling? Not the LLM itself. The model proposes a call; the actual execution – including permission checks and validation – sits in the surrounding code. This separation is the foundation of every security architecture.
- Why is this the core capability of agents? Without Tool-Use, an LLM remains an advisor that can only talk. Only Tool Calling turns the reasoning engine into an agent that perceives, plans, acts and observes – the full Perceive → Reason → Act → Observe loop.
A concrete example: check availability and book an appointment
Suppose an agent is to book a consultation appointment for a customer. The model is provided with two tools, each with a name, a description and a parameter schema:
get_availability(date: string)– returns free slots for a date.book_appointment(slot_id: string, customer_email: string)– books a slot.
The flow:
- Reason: The user writes "Would you have time next Tuesday morning?". The model recognizes that it first needs the availability.
- Act: Instead of a text answer, the model returns a structured call:
get_availability(date: "2026-06-16"). - Execute: The application – not the model – calls the real calendar API and receives two free slots.
- Observe: The result (
["09:00", "10:30"]) is fed back into the model as an observation. - Loop: The model formulates a follow-up question or – after confirmation – calls
book_appointment(...).
Important: the writing step (book_appointment) is irreversible in terms of its effect. This is exactly where a human-in-the-loop or confirmation step belongs, before the agent executes the action.
The mechanics of Function Calling
Function Calling follows the same basic pattern in all common LLM APIs, regardless of the provider:
- Tool definition: Each tool is described via a JSON schema – name, a natural-language description of its purpose, and the typed parameters. This description is the only information the model uses to decide whether and how it uses the tool.
- Tool selection: In the reasoning step, the model selects the appropriate tool and fills in the parameters. It uses exclusively the provided descriptions plus the conversation context.
- Structured output: Instead of prose, the model delivers a machine-readable object. Constrained decoding, or schema-bound generation, ensures the output matches the expected format.
- Execution & observation: The call is executed, the result returned, and the loop continues until the goal is reached or a stopping criterion (loop limit, error, guardrail) takes effect.
The most important practical finding: when an agent acts incorrectly, the cause usually lies not with the model but with the tool definition. Anthropic states the guideline sharply – if a human engineer cannot unambiguously say which tool to use in a situation, neither can an agent.
Three design rules follow from this:
- Limit the tool count: 3-7 actively loaded tools are 2026 best practice; from around 10 tools onward, measurable degradation in tool choice begins. In Anthropic's internal MCP evals, tool-selection accuracy with dynamic tool search rose from 49% to 74% (Opus 4) and from 79.5% to 88.1% (Opus 4.5), respectively.
- Avoid overlap: Two tools that could plausibly answer the same request are the problem that no prompt solves. A clear "when-not-to-use" clause in the description is the most effective, and often forgotten, component.
- Unambiguous names and parameters: Descriptive names and tightly typed parameters reduce misinterpretations before they arise.
The connection to MCP: a standard for tool integration
Without a standard, every tool integration has to be wired up individually – custom glue code per agent, per API, per data source. The Model Context Protocol (MCP) solves this M-to-N problem: it defines a uniform protocol through which an agent (MCP client) talks to tools and data sources (MCP servers) in a standardized way.
MCP is thus the "how" behind Tool Calling once tools no longer live in your own code but are provided as reusable servers. An MCP server provides tools, resources and prompts; the agent discovers them at runtime and calls them via the same Function-Calling pattern. The level of maturity is high: the MCP specification is available in version 2025-11-25; the project was handed over to the Linux Foundation (Agentic AI Foundation) in December 2025, and there are already more than 10,000 MCP servers.
While MCP standardizes the agent ↔ tool connection, the complementary protocol A2A (Agent-to-Agent) governs agent ↔ agent communication in multi-agent systems. A2A has likewise been with the Linux Foundation since June 2025 and is backed by more than 150 organizations. For pure Tool Calling, MCP is the relevant standard; A2A only becomes important at the L5 (multi-agent) level.
Security: tool permissions as a must, not an option
As soon as an agent touches real systems – writes to databases, sends emails, executes code – tool security becomes a central requirement. The good news: because the LLM only proposes calls and the surrounding code executes them, the control point sits exactly where it belongs.
Aspect | Risk without protection | Measure |
|---|---|---|
Permissions (scopes) | Agent gets full access via one API token | Least privilege: only the minimally necessary scopes per tool, separate credentials per tool |
Call validation | Model produces invalid or harmful parameters | Allowlists, schema and value-range validation in the executor, not in the model |
Irreversible actions | Agent deletes, books or sends without control | Human-in-the-loop / confirmation step for writing and non-reversible calls |
Prompt/tool injection | Manipulated tool response steers the agent | Encapsulate untrusted output, quarantine, never treat unchecked tool output as a command |
Traceability | Misbehavior cannot be reconstructed | Observability: log every tool call with parameters, result and decision context |
Three pitfalls from practice are especially common: treating agents deterministically, even though tool choice is probabilistic; running no observability, so that errors in the loop remain invisible; and providing no human-in-the-loop for irreversible actions. In compliance-sensitive DACH contexts, there is an additional point: writing actions on personal data touch GDPR obligations, and automated decisions with legal effect can fall under Art. 22 GDPR (informational, not legal advice).
Conclusion
Tool Calling is the mechanism that turns an LLM into an agent: the model proposes structured calls, the code executes them, the result drives the next step. The quality is decided in three places – precise, non-overlapping tool definitions, a lean active tool set (3-7) with dynamic discovery, and a security architecture of least privilege, validation, confirmation steps and observability. MCP provides the open standard for this, raising tool integration from individual integrations to reusable servers. Whoever masters these fundamentals builds agents that act reliably and under control in production.
FAQ
What is the difference between Tool Calling and Function Calling?
Does the LLM call the tools itself?
What does MCP have to do with Tool Calling?
How many tools should an agent have at the same time?
How do you secure tool calls?
What is the most common source of error in Tool Calling?
Want to go deeper?
Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.