Skip to content
1.8Beginner6 min

Tool Calling: How AI Agents Use Tools

Blck Alpaca·
Definition

Tool Calling (also Function Calling) is the core capability that lets an AI Agent move beyond pure text generation: the LLM receives machine-readable descriptions of tools (APIs, data sources) and, when needed, produces a structured call with parameters that the application executes. The result flows back into the model, which then plans the next step. This turns the LLM from a text generator into an acting agent that perceives, decides and triggers actions in real systems.

Key Takeaways

  • Tool Calling is the bridge between the LLM and the outside world: the model returns a structured call (function name plus parameters in a JSON schema), the application executes it and feeds the result back into the reasoning loop (Perceive-Reason-Act-Observe) as an observation.
  • The LLM does not call tools itself - it proposes the call. Execution, validation and permissions sit with the surrounding code. That is exactly where tool permissions, input validation and human-in-the-loop for irreversible actions belong.
  • The most common source of error is not the models but unclear or overlapping tool definitions. Precise names, unambiguous descriptions and a when-not-to-use clause have a stronger effect than any prompt trick.
  • 3-7 actively loaded tools are 2026 best practice; from around 10 tools onward, measurable degradation in tool choice begins. Dynamic tool discovery instead of static full loading keeps the selection sharp.
  • The Model Context Protocol (MCP) is the open standard for tool integration: an agent can access tools and data sources in a standardized way via MCP servers, without writing custom glue code for every integration.
  • Security is not an add-on: least-privilege scopes per tool, allowlists, confirmation steps for writing or irreversible actions and observability across all tool calls are mandatory as soon as an agent touches real systems.

What is Tool Calling?

Tool Calling (also Function Calling or Tool-Use) is the core capability that lets an AI Agent move beyond pure text generation. The LLM receives machine-readable descriptions of tools – functions, APIs, data sources – and, when needed, produces a structured call with parameters that the surrounding application executes. The result flows back into the model as an observation, which then plans the next step. This turns the LLM from a text generator into an acting agent.

The three core answers up front:

  • What happens technically? The model returns not running prose but a structured call: a function name plus arguments in JSON format. The application executes this call and feeds the result back into the model.
  • Who actually does the calling? Not the LLM itself. The model proposes a call; the actual execution – including permission checks and validation – sits in the surrounding code. This separation is the foundation of every security architecture.
  • Why is this the core capability of agents? Without Tool-Use, an LLM remains an advisor that can only talk. Only Tool Calling turns the reasoning engine into an agent that perceives, plans, acts and observes – the full Perceive → Reason → Act → Observe loop.

A concrete example: check availability and book an appointment

Suppose an agent is to book a consultation appointment for a customer. The model is provided with two tools, each with a name, a description and a parameter schema:

  1. get_availability(date: string) – returns free slots for a date.
  2. book_appointment(slot_id: string, customer_email: string) – books a slot.

The flow:

  1. Reason: The user writes "Would you have time next Tuesday morning?". The model recognizes that it first needs the availability.
  2. Act: Instead of a text answer, the model returns a structured call: get_availability(date: "2026-06-16").
  3. Execute: The application – not the model – calls the real calendar API and receives two free slots.
  4. Observe: The result (["09:00", "10:30"]) is fed back into the model as an observation.
  5. Loop: The model formulates a follow-up question or – after confirmation – calls book_appointment(...).

Important: the writing step (book_appointment) is irreversible in terms of its effect. This is exactly where a human-in-the-loop or confirmation step belongs, before the agent executes the action.

The mechanics of Function Calling

Function Calling follows the same basic pattern in all common LLM APIs, regardless of the provider:

  1. Tool definition: Each tool is described via a JSON schema – name, a natural-language description of its purpose, and the typed parameters. This description is the only information the model uses to decide whether and how it uses the tool.
  2. Tool selection: In the reasoning step, the model selects the appropriate tool and fills in the parameters. It uses exclusively the provided descriptions plus the conversation context.
  3. Structured output: Instead of prose, the model delivers a machine-readable object. Constrained decoding, or schema-bound generation, ensures the output matches the expected format.
  4. Execution & observation: The call is executed, the result returned, and the loop continues until the goal is reached or a stopping criterion (loop limit, error, guardrail) takes effect.

The most important practical finding: when an agent acts incorrectly, the cause usually lies not with the model but with the tool definition. Anthropic states the guideline sharply – if a human engineer cannot unambiguously say which tool to use in a situation, neither can an agent.

Three design rules follow from this:

  • Limit the tool count: 3-7 actively loaded tools are 2026 best practice; from around 10 tools onward, measurable degradation in tool choice begins. In Anthropic's internal MCP evals, tool-selection accuracy with dynamic tool search rose from 49% to 74% (Opus 4) and from 79.5% to 88.1% (Opus 4.5), respectively.
  • Avoid overlap: Two tools that could plausibly answer the same request are the problem that no prompt solves. A clear "when-not-to-use" clause in the description is the most effective, and often forgotten, component.
  • Unambiguous names and parameters: Descriptive names and tightly typed parameters reduce misinterpretations before they arise.

The connection to MCP: a standard for tool integration

Without a standard, every tool integration has to be wired up individually – custom glue code per agent, per API, per data source. The Model Context Protocol (MCP) solves this M-to-N problem: it defines a uniform protocol through which an agent (MCP client) talks to tools and data sources (MCP servers) in a standardized way.

MCP is thus the "how" behind Tool Calling once tools no longer live in your own code but are provided as reusable servers. An MCP server provides tools, resources and prompts; the agent discovers them at runtime and calls them via the same Function-Calling pattern. The level of maturity is high: the MCP specification is available in version 2025-11-25; the project was handed over to the Linux Foundation (Agentic AI Foundation) in December 2025, and there are already more than 10,000 MCP servers.

While MCP standardizes the agent ↔ tool connection, the complementary protocol A2A (Agent-to-Agent) governs agent ↔ agent communication in multi-agent systems. A2A has likewise been with the Linux Foundation since June 2025 and is backed by more than 150 organizations. For pure Tool Calling, MCP is the relevant standard; A2A only becomes important at the L5 (multi-agent) level.

Security: tool permissions as a must, not an option

As soon as an agent touches real systems – writes to databases, sends emails, executes code – tool security becomes a central requirement. The good news: because the LLM only proposes calls and the surrounding code executes them, the control point sits exactly where it belongs.

Aspect

Risk without protection

Measure

Permissions (scopes)

Agent gets full access via one API token

Least privilege: only the minimally necessary scopes per tool, separate credentials per tool

Call validation

Model produces invalid or harmful parameters

Allowlists, schema and value-range validation in the executor, not in the model

Irreversible actions

Agent deletes, books or sends without control

Human-in-the-loop / confirmation step for writing and non-reversible calls

Prompt/tool injection

Manipulated tool response steers the agent

Encapsulate untrusted output, quarantine, never treat unchecked tool output as a command

Traceability

Misbehavior cannot be reconstructed

Observability: log every tool call with parameters, result and decision context

Three pitfalls from practice are especially common: treating agents deterministically, even though tool choice is probabilistic; running no observability, so that errors in the loop remain invisible; and providing no human-in-the-loop for irreversible actions. In compliance-sensitive DACH contexts, there is an additional point: writing actions on personal data touch GDPR obligations, and automated decisions with legal effect can fall under Art. 22 GDPR (informational, not legal advice).

Conclusion

Tool Calling is the mechanism that turns an LLM into an agent: the model proposes structured calls, the code executes them, the result drives the next step. The quality is decided in three places – precise, non-overlapping tool definitions, a lean active tool set (3-7) with dynamic discovery, and a security architecture of least privilege, validation, confirmation steps and observability. MCP provides the open standard for this, raising tool integration from individual integrations to reusable servers. Whoever masters these fundamentals builds agents that act reliably and under control in production.

FAQ

What is the difference between Tool Calling and Function Calling?
The terms are used largely synonymously. Function Calling emphasizes the technical mechanics (the model produces a function call in a JSON schema), Tool-Use describes the same thing from the agent's perspective (the model uses a tool). In practice, both mean the same process: the LLM returns a structured call that the application executes.
Does the LLM call the tools itself?
No. The LLM only proposes a call - it produces the function name and parameters. The actual execution is handled by the surrounding code (the executor), which also checks permissions, validates parameters and returns the result. This separation is the foundation of every security architecture for agents.
What does MCP have to do with Tool Calling?
MCP (Model Context Protocol) is the open standard that unifies the connection between agent and tool. Instead of programming every integration individually, an MCP server provides tools in a standardized way that the agent discovers at runtime and calls via the usual Function-Calling pattern. MCP is the how behind Tool Calling once tools are available as reusable servers.
How many tools should an agent have at the same time?
3-7 actively loaded tools are 2026 best practice. From around 10 tools onward, measurable degradation in tool choice begins, because the model increasingly struggles to select the right tool. Better than one large static set is dynamic tool discovery: only the tools relevant to the current step are loaded.
How do you secure tool calls?
Across several layers: least-privilege scopes per tool and separate credentials, input validation and allowlists in the executor, a human-in-the-loop or confirmation step for writing and irreversible actions, encapsulation of untrusted tool outputs (against tool injection), as well as observability that logs every call with parameters and result.
What is the most common source of error in Tool Calling?
Not the model, but the tool definition. Unclear or overlapping descriptions lead the agent to choose the wrong tool. The remedy lies in descriptive names, tightly typed parameters and, above all, an unambiguous when-not-to-use clause per tool - that has a stronger effect than any prompt trick.

Want to go deeper?

Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.