Agent Handoffs: Passing Context and State Between Agents
An agent handoff is the controlled transfer of a task, together with its context and state, from one AI agent to another. Instead of answering a query itself, an agent passes the dialogue to a specialised agent. What matters is which context is transferred: full history, summary or structured state.
Key Takeaways
- ✓A handoff transfers not only the task but also context and state - the way the transfer is handled determines whether the multi-agent system succeeds or fails.
- ✓Three strategies: full conversation history (highest fidelity, expensive), compressed summary (Anthropic Research Agent pattern, token-efficient) and structured state (typed schema, most robust).
- ✓In the OpenAI Agents SDK (March 2025), handoffs are an explicit first-class concept: one agent's turn ends and another takes over with the history passed along - evolved from the Swarm framework (October 2024).
- ✓The main sources of failure are context fragmentation through excessive compression, endless handoffs (resource deadlock) and cascading failures, in which an incorrect fact propagates through the chain.
- ✓Best practice (as of 2026): keep the write path single-threaded, force sub-agent outputs into a typed schema, compress before returning, and give every handoff a trace ID plus a timeout.
An agent handoff is the controlled transfer of a task, together with its context and state, from one AI agent to another. Instead of finishing a request itself, an agent passes the ongoing dialogue to a specialised agent. The central design question is not whether but what is transferred: the full conversation history, a compressed summary or a structured state. This decision determines, more than any other, whether a multi-agent system works or fails due to context loss.
- What is transferred? Context (the dialogue so far, decisions, intermediate results) and state (structured data, task status, variables) - in one of three fidelity levels.
- Who takes control? Unlike a tool call, after a handoff the receiving agent continues the dialogue; the handing-over agent relinquishes its turn.
- What goes wrong? Context loss through over-compression, endless handoffs (resource deadlock) and cascading failures, in which an incorrect fact travels through the chain.
Three handoff strategies: full history, summary, structured state
The choice of transfer strategy is, at its core, a trade-off between information fidelity and token cost - and it depends on the state coupling between the subtasks.
1. Full conversation history. The receiving agent receives the complete trace so far - all messages, tool calls and intermediate decisions. For tightly coupled work (in which every action shapes subsequent decisions), Cognition.ai advocates exactly this position: share complete agent traces, not just individual messages. The reasoning is blunt: apparent disagreements between agents are usually symptoms of context fragmentation, not genuine disagreement. Implicit decisions that are not passed along lead to incompatible subsequent decisions. The price: high token consumption that grows with every handoff.
2. Compressed summary. The handing-over agent condenses its context into a summary before transferring. This is the reference pattern of the Anthropic Research Agent (documented in "How we built our multi-agent research system", 13 June 2025): each sub-agent works in its own context window of around 200k tokens and returns a compressed summary to the lead agent - never the full transcript. The lead synthesises the answer from these. This keeps its context window reserved for the plan, sub-agent results and synthesis rather than raw search snippets. Token-efficient and parallelisable, but with the risk of context loss if condensed too aggressively.
3. Structured state. Instead of free text, a typed schema is transferred (a Pydantic model, JSON schema or an A2A artifact). This is the most robust option for clearly defined handoffs, because the format enforces completeness and nothing is implicitly lost. LangGraph passes a typed state object through its nodes and checkpoints it durably - an interrupted workflow resumes at the last checkpoint after a server restart.
Strategy | Information fidelity | Token cost | Suitable for | Main risk |
|---|---|---|---|---|
Full history | Highest | High, grows per handoff | Tightly coupled, sequential tasks | Context bloat, cost explosion |
Compressed summary | Medium | Low | Parallelisable read-mostly tasks | Context loss through over-compression |
Structured state | High (for defined fields) | Low–medium | Pipelines, clear contracts | Loss of non-modelled information |
Handoffs in the OpenAI Agents SDK and the Swarm model
The OpenAI Agents SDK (March 2025) introduces handoffs as an explicit first-class concept. It is the renamed and extended version of OpenAI Swarm (October 2024), the reference framework for the peer-to-peer/swarm pattern. The semantics according to the research source: one agent's turn ends and another agent's turn begins - the conversation history is passed along. A triage agent therefore decides that a request belongs to the billing agent and hands it over; from then on the billing agent leads the dialogue.
This sharply distinguishes handoffs from the tool call: with a tool call, the same agent retains control and only gets a result back. With a handoff, responsibility changes hands. For comparison, the message-passing semantics of the most important stacks:
Stack | Handoff/transfer semantics |
|---|---|
OpenAI Agents SDK | Explicit handoffs; turn change, history is passed along |
AutoGen | Group chat with turn-taking; shared history; addressing via mentions |
LangGraph | Typed state object through nodes; checkpointed; durable across restarts |
Anthropic Claude Agent SDK | Lead spawns sub-agents via the Task tool; return of compressed structured outputs |
A2A (cross-vendor) | Task lifecycle |
For transfers across vendor boundaries (for example a Salesforce agent calling an SAP Joule agent), the contract is not the SDK handoff but the A2A protocol. There, the task lifecycle defines the states, the target agent publishes an AgentCard with its capabilities, and the result comes back as an artifact - the internal prompts and models deliberately remaining hidden.
Sources of failure: context loss and endless handoffs
Handoffs create classes of failure that a single-agent system does not know. Four of them are directly handoff-relevant:
- Context fragmentation. Sub-agents make incompatible implicit decisions because context was lost during the transfer. Remedy: share full traces where coupling is high; explicit decision contracts.
- Cascading failures. One agent hallucinates a fact; the next adopts it as truth; downstream, further agents act on the false premise. Remedy: a verifier/judge agent, grounded retrieval sources, mandatory citations.
- Resource deadlock / endless handoffs. Agents wait on each other in a circular fashion (A waits for B's result, B for a follow-up question from A) or repeatedly push a task back and forth. Remedy: a timeout on every task, an explicit
input-requiredstate (A2A), a hard limit on handoff steps per run. - Authority confusion. A sub-agent overrides the lead's instructions, or the lead loses authority over the dialogue. Remedy: clear role hierarchies and strong AgentCard/task contracts.
An additional security aspect: every new agent context window is a new attack surface. Prompt-injection amplification scales linearly with the number of agents that ingest untrusted content - the EchoLeak class (CVE-2025-32711, disclosed to Microsoft 365 Copilot by Aim Labs in June 2025) is the documented reference case here.
Best practices for robust handoffs
From the Anthropic Research Agent design and the Cognition synthesis (as of 2026), a practical checklist can be derived:
- Keep the write path single-threaded. Many agents may read, research and propose - but only one agent or one pipeline stage commits. Cognition's updated position (April 2026): multiple agents contribute intelligence while the writes remain single-threaded. Parallel write swarms fragment the context.
- Force outputs into a typed schema. Sub-agent results as a Pydantic model, JSON schema or A2A artifact, not as free text.
- Compress before returning. Never pipe a full sub-agent transcript back to the lead.
- Teach delegation explicitly. Each sub-agent receives a goal, output format, tool/source hints and clear task boundaries. At Anthropic, vague delegations caused duplicated work and gaps in coverage.
- Cap the token budget per sub-agent and give every handoff a trace ID plus a timeout - for traceability and against endless loops.
Example: claims processing with seven agents (Allianz Project Nemo)
Allianz Project Nemo processes food-spoilage claims after natural disasters with seven specialised agents: Planner, Cyber, Coverage, Weather, Fraud, Payout, Audit. The workflow runs as a hierarchical chain of handoffs - the Planner breaks down the case, hands over to Coverage and Weather for assessment, to Fraud for fraud checks, and finally to Payout. The dedicated Audit agent produces a complete summary of all agent decisions and rationales, thus building the audit trail into the agent topology rather than only into the logging.
Concrete figures (deployed in Australia, July 2025, live in under 100 days): the entire seven-agent workflow is completed in under five minutes, and the processing and settlement time for eligible claims under AUD 500 fell by 80 per cent. Crucially, the final payout remains human-in-the-loop - a caseworker reviews the audit summary. This is single-threaded write in its purest form: six agents contribute intelligence, one (plus a human) decides.
Pseudocode of the handoff core:
```
planner.handoff(coverage, state={case_id, policy, claim_amount})
coverage.handoff(fraud, state + {coverage_ok: true})
fraud.handoff(payout, state + {fraud_score: 0.04})
payout.handoff(audit, state + {recommendation: "pay out", amount: 480})
audit -> human_review(summary) # the only write commit
```
For agencies and B2B decision-makers
Anyone planning multi-agent workflows for production - whether as a DACH mid-sized company with one or two use cases, or as an agency productising agentic services - should make the handoff strategy the first architectural decision. The rule of thumb: default to structured state or compressed summary, full history only with high coupling; keep the write path single-threaded; give every handoff a timeout, step limit and trace ID. As an agency, it is worth making your own agents A2A-capable with published AgentCards, so that client systems (Agentforce, Joule, Copilot Studio) can address them without custom integration. Blck Alpaca supports DACH B2B teams in designing such handoff architectures - from pattern selection to observability across the entire agent chain.
FAQ
What is the difference between a handoff and a normal tool call?
Full history or summary - which handoff strategy is better?
How do you prevent endless handoffs between agents?
What is context loss in a handoff and why is it dangerous?
Which frameworks support agent handoffs natively?
Want to go deeper?
Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.