5.4Intermediate7 min

Agent Handoffs: Passing Context and State Between Agents

Blck Alpaca·9 June 2026

Definition

An agent handoff is the controlled transfer of a task, together with its context and state, from one AI agent to another. Instead of answering a query itself, an agent passes the dialogue to a specialised agent. What matters is which context is transferred: full history, summary or structured state.

Key Takeaways

✓A handoff transfers not only the task but also context and state - the way the transfer is handled determines whether the multi-agent system succeeds or fails.
✓Three strategies: full conversation history (highest fidelity, expensive), compressed summary (Anthropic Research Agent pattern, token-efficient) and structured state (typed schema, most robust).
✓In the OpenAI Agents SDK (March 2025), handoffs are an explicit first-class concept: one agent's turn ends and another takes over with the history passed along - evolved from the Swarm framework (October 2024).
✓The main sources of failure are context fragmentation through excessive compression, endless handoffs (resource deadlock) and cascading failures, in which an incorrect fact propagates through the chain.
✓Best practice (as of 2026): keep the write path single-threaded, force sub-agent outputs into a typed schema, compress before returning, and give every handoff a trace ID plus a timeout.

An agent handoff is the controlled transfer of a task, together with its context and state, from one AI agent to another. Instead of finishing a request itself, an agent passes the ongoing dialogue to a specialised agent. The central design question is not whether but what is transferred: the full conversation history, a compressed summary or a structured state. This decision determines, more than any other, whether a multi-agent system works or fails due to context loss.

What is transferred? Context (the dialogue so far, decisions, intermediate results) and state (structured data, task status, variables) - in one of three fidelity levels.
Who takes control? Unlike a tool call, after a handoff the receiving agent continues the dialogue; the handing-over agent relinquishes its turn.
What goes wrong? Context loss through over-compression, endless handoffs (resource deadlock) and cascading failures, in which an incorrect fact travels through the chain.

Three handoff strategies: full history, summary, structured state

The choice of transfer strategy is, at its core, a trade-off between information fidelity and token cost - and it depends on the state coupling between the subtasks.

1. Full conversation history. The receiving agent receives the complete trace so far - all messages, tool calls and intermediate decisions. For tightly coupled work (in which every action shapes subsequent decisions), Cognition.ai advocates exactly this position: share complete agent traces, not just individual messages. The reasoning is blunt: apparent disagreements between agents are usually symptoms of context fragmentation, not genuine disagreement. Implicit decisions that are not passed along lead to incompatible subsequent decisions. The price: high token consumption that grows with every handoff.

2. Compressed summary. The handing-over agent condenses its context into a summary before transferring. This is the reference pattern of the Anthropic Research Agent (documented in "How we built our multi-agent research system", 13 June 2025): each sub-agent works in its own context window of around 200k tokens and returns a compressed summary to the lead agent - never the full transcript. The lead synthesises the answer from these. This keeps its context window reserved for the plan, sub-agent results and synthesis rather than raw search snippets. Token-efficient and parallelisable, but with the risk of context loss if condensed too aggressively.

3. Structured state. Instead of free text, a typed schema is transferred (a Pydantic model, JSON schema or an A2A artifact). This is the most robust option for clearly defined handoffs, because the format enforces completeness and nothing is implicitly lost. LangGraph passes a typed state object through its nodes and checkpoints it durably - an interrupted workflow resumes at the last checkpoint after a server restart.

Strategy	Information fidelity	Token cost	Suitable for	Main risk
Full history	Highest	High, grows per handoff	Tightly coupled, sequential tasks	Context bloat, cost explosion
Compressed summary	Medium	Low	Parallelisable read-mostly tasks	Context loss through over-compression
Structured state	High (for defined fields)	Low–medium	Pipelines, clear contracts	Loss of non-modelled information

Handoffs in the OpenAI Agents SDK and the Swarm model

The OpenAI Agents SDK (March 2025) introduces handoffs as an explicit first-class concept. It is the renamed and extended version of OpenAI Swarm (October 2024), the reference framework for the peer-to-peer/swarm pattern. The semantics according to the research source: one agent's turn ends and another agent's turn begins - the conversation history is passed along. A triage agent therefore decides that a request belongs to the billing agent and hands it over; from then on the billing agent leads the dialogue.

This sharply distinguishes handoffs from the tool call: with a tool call, the same agent retains control and only gets a result back. With a handoff, responsibility changes hands. For comparison, the message-passing semantics of the most important stacks:

Stack	Handoff/transfer semantics
OpenAI Agents SDK	Explicit handoffs; turn change, history is passed along
AutoGen	Group chat with turn-taking; shared history; addressing via mentions
LangGraph	Typed state object through nodes; checkpointed; durable across restarts
Anthropic Claude Agent SDK	Lead spawns sub-agents via the Task tool; return of compressed structured outputs
A2A (cross-vendor)	Task lifecycle `submitted → working → input-required → completed/failed/canceled`; multi-part messages; artifacts as output; the agents' internals remain opaque

For transfers across vendor boundaries (for example a Salesforce agent calling an SAP Joule agent), the contract is not the SDK handoff but the A2A protocol. There, the task lifecycle defines the states, the target agent publishes an AgentCard with its capabilities, and the result comes back as an artifact - the internal prompts and models deliberately remaining hidden.

Sources of failure: context loss and endless handoffs

Handoffs create classes of failure that a single-agent system does not know. Four of them are directly handoff-relevant:

Context fragmentation. Sub-agents make incompatible implicit decisions because context was lost during the transfer. Remedy: share full traces where coupling is high; explicit decision contracts.
Cascading failures. One agent hallucinates a fact; the next adopts it as truth; downstream, further agents act on the false premise. Remedy: a verifier/judge agent, grounded retrieval sources, mandatory citations.
Resource deadlock / endless handoffs. Agents wait on each other in a circular fashion (A waits for B's result, B for a follow-up question from A) or repeatedly push a task back and forth. Remedy: a timeout on every task, an explicit input-required state (A2A), a hard limit on handoff steps per run.
Authority confusion. A sub-agent overrides the lead's instructions, or the lead loses authority over the dialogue. Remedy: clear role hierarchies and strong AgentCard/task contracts.

An additional security aspect: every new agent context window is a new attack surface. Prompt-injection amplification scales linearly with the number of agents that ingest untrusted content - the EchoLeak class (CVE-2025-32711, disclosed to Microsoft 365 Copilot by Aim Labs in June 2025) is the documented reference case here.

Best practices for robust handoffs

From the Anthropic Research Agent design and the Cognition synthesis (as of 2026), a practical checklist can be derived:

Keep the write path single-threaded. Many agents may read, research and propose - but only one agent or one pipeline stage commits. Cognition's updated position (April 2026): multiple agents contribute intelligence while the writes remain single-threaded. Parallel write swarms fragment the context.
Force outputs into a typed schema. Sub-agent results as a Pydantic model, JSON schema or A2A artifact, not as free text.
Compress before returning. Never pipe a full sub-agent transcript back to the lead.
Teach delegation explicitly. Each sub-agent receives a goal, output format, tool/source hints and clear task boundaries. At Anthropic, vague delegations caused duplicated work and gaps in coverage.
Cap the token budget per sub-agent and give every handoff a trace ID plus a timeout - for traceability and against endless loops.

Example: claims processing with seven agents (Allianz Project Nemo)

Allianz Project Nemo processes food-spoilage claims after natural disasters with seven specialised agents: Planner, Cyber, Coverage, Weather, Fraud, Payout, Audit. The workflow runs as a hierarchical chain of handoffs - the Planner breaks down the case, hands over to Coverage and Weather for assessment, to Fraud for fraud checks, and finally to Payout. The dedicated Audit agent produces a complete summary of all agent decisions and rationales, thus building the audit trail into the agent topology rather than only into the logging.

Concrete figures (deployed in Australia, July 2025, live in under 100 days): the entire seven-agent workflow is completed in under five minutes, and the processing and settlement time for eligible claims under AUD 500 fell by 80 per cent. Crucially, the final payout remains human-in-the-loop - a caseworker reviews the audit summary. This is single-threaded write in its purest form: six agents contribute intelligence, one (plus a human) decides.

Pseudocode of the handoff core:

```
planner.handoff(coverage, state={case_id, policy, claim_amount})
coverage.handoff(fraud, state + {coverage_ok: true})
fraud.handoff(payout, state + {fraud_score: 0.04})
payout.handoff(audit, state + {recommendation: "pay out", amount: 480})
audit -> human_review(summary) # the only write commit
```

For agencies and B2B decision-makers

Anyone planning multi-agent workflows for production - whether as a DACH mid-sized company with one or two use cases, or as an agency productising agentic services - should make the handoff strategy the first architectural decision. The rule of thumb: default to structured state or compressed summary, full history only with high coupling; keep the write path single-threaded; give every handoff a timeout, step limit and trace ID. As an agency, it is worth making your own agents A2A-capable with published AgentCards, so that client systems (Agentforce, Joule, Copilot Studio) can address them without custom integration. Blck Alpaca supports DACH B2B teams in designing such handoff architectures - from pattern selection to observability across the entire agent chain.

FAQ

What is the difference between a handoff and a normal tool call?

A tool call invokes a function or data source and returns a result to the same agent, which retains control. With a handoff, by contrast, the agent transfers the conversation and ownership of the task to another agent - the receiving agent continues the dialogue. In the OpenAI Agents SDK, this is precisely the handoff semantics: one agent's turn ends and the next begins with the history passed along.

Full history or summary - which handoff strategy is better?

It depends on the state coupling. With high coupling, where every decision shapes subsequent decisions, Cognition recommends sharing complete agent traces rather than just individual messages, because fragmented context is the main cause of cascading failures. With low coupling and parallelisable tasks, the Anthropic Research Agent pattern uses compressed summaries to save tokens. Structured state is the most robust option for clearly defined handoffs.

How do you prevent endless handoffs between agents?

Endless handoffs occur as a resource deadlock (agent A waits for B's result, B waits for a follow-up question from A) or as a task being repeatedly passed back and forth. The remedies are: a timeout on every task, an explicit input-required state as in the A2A protocol, a maximum number of handoff steps per run, and clear role hierarchies so that one agent retains authority.

What is context loss in a handoff and why is it dangerous?

Context loss occurs when relevant information is dropped during the transfer - for example because it was summarised too aggressively or only individual messages were passed along instead of the full trace. Cognition describes how apparent disagreements between agents are usually symptoms of context fragmentation. The receiving agent then makes incompatible implicit decisions that lead to faulty results.

Which frameworks support agent handoffs natively?

The OpenAI Agents SDK (March 2025, evolved from Swarm) uses explicit handoffs as a core concept. LangGraph passes a typed state object through nodes and checkpoints it durably. AutoGen works with group chat and turn-taking, including shared history. The Anthropic Claude Agent SDK spawns sub-agents via the Task tool, which return compressed structured outputs. Across vendors, the handoff runs over the A2A protocol with its task lifecycle.

Want to go deeper?

Get new analyses straight to your inbox, or see how we put this knowledge to work for companies.

Subscribe to newsletter →Our services

Previous← The Supervisor Pattern: Making One Agent the Boss NextShared Memory vs. Message Passing in Multi-Agent Systems →