Orchestrator–Worker: The Most Robust Multi-Agent Pattern
The orchestrator-worker pattern is a multi-agent set-up in which a lead agent (orchestrator) breaks a task into subtasks, delegates them to specialised worker agents each with their own context window, and consolidates their compressed results into a single answer. It is regarded as the most robust pattern for decomposable, breadth-search tasks such as research and reporting.
Key Takeaways
- ✓An orchestrator breaks down the task and delegates clearly defined subtasks to worker agents, which work in parallel each with their own context window and return only compressed results.
- ✓For its research system (lead Claude Opus 4 plus parallel Claude Sonnet 4 workers), Anthropic documented a gain of 90.2 percent on internal breadth metrics over a single agent, at roughly 15x the token consumption (as of June 2025).
- ✓The pattern suits parallelisable tasks with low coupling (research, market monitoring, multi-document review) and fails on tightly coupled writing tasks such as code generation.
- ✓Robustness comes from isolated worker contexts plus single-threaded synthesis by the orchestrator; this avoids the context fragmentation of parallel writing swarms.
- ✓The main cost pitfalls are vague delegations, orchestrator drift and prompt injection per worker context; the countermeasures are typed outputs, token caps and precise task briefings.
- ✓Production-ready today at Anthropic, AWS Bedrock (supervisor pattern) and Salesforce Agentforce; token factor typically 5 to 15x (as of 2026).
The orchestrator-worker pattern is a multi-agent set-up in which a lead agent (orchestrator) breaks a task into subtasks, delegates them to specialised worker agents each with their own context window, and consolidates their compressed results into a final answer. In the Anthropic taxonomy it is building block number five and is regarded as the canonical multi-agent pattern. At the same time it is the most robust, because it sidesteps the most common source of error in distributed agents.
- What happens: the orchestrator plans and decomposes, the workers operate in parallel and independently, and the orchestrator alone aggregates and writes.
- What it suits: parallelisable tasks with low coupling between the subtasks, such as research, market monitoring, multi-document review, broad triage.
- What it does not suit: tightly coupled, sequential writing tasks such as code generation across multiple files or complex transaction workflows.
How the pattern works
The orchestrator runs in a planning or extended-thinking mode. It receives the request, breaks it down into discrete, mutually independent subtasks and spawns a worker agent for each via a Task or subagent tool. The strict isolation is decisive: each worker has its own prompt, its own toolset and, above all, its own context window. The workers research, call tools and reason independently of one another. Instead of their full transcripts, they return only a compressed result to the orchestrator, which synthesises the final answer from these.
This architecture is the exact opposite of an undisciplined agent swarm. The worker contexts are separated so that parallelism becomes possible in the first place. The write path, however, remains single-threaded: only the orchestrator consolidates the fragments. It is precisely this combination, isolated reading workers plus single-threaded synthesis, that makes the pattern resilient.
Why it is regarded as the most robust multi-agent pattern
The robustness emerges from the engineering discourse of 2025. Cognition.ai argued in "Don't Build Multi-Agents" (12 June 2025) that parallel sub-agents are fragile by default because they fragment their context. Several agents writing simultaneously make contradictory implicit decisions, for example on style or edge cases, and produce results that no one can consolidate any longer. Cognition's central thesis: "Apparent disagreements between agents are usually symptoms of context fragmentation."
This is precisely where the orchestrator-worker pattern comes in. It avoids competing writes entirely. The workers contribute intelligence by reading, gathering and proposing, but only the orchestrator reconciles and writes. In its update "Multi-Agents: What's Actually Working" (22 April 2026), Cognition confirmed this position: the company itself uses multi-agent systems, but precisely the variant "in which several agents contribute intelligence to a task while the writes remain single-threaded." Orchestrator-worker is the clean implementation of this principle. It is not the most powerful pattern for every case, but it is the most reliable for the class of tasks for which multi-agent makes sense at all.
The reference example: Anthropic Research Agent
The canonical case study is Anthropic's multi-agent research system, documented in "How we built our multi-agent research system" (13 June 2025) and in production in the Research feature of Claude.ai. The set-up:
- Lead orchestrator: Claude Opus 4 in extended-thinking mode, which breaks down the user request and delegates.
- Workers: N parallel Claude Sonnet 4 sub-agents, spawned via the Task tool. Each worker has its own context window of around 200,000 tokens and uses search and fetch tools independently.
- Return: each worker delivers a compressed summary, not a full transcript. The orchestrator synthesises the answer from these.
The figures from Anthropic's internal evaluations are the most-cited argument for the pattern:
Metric | Value |
|---|---|
Quality gain on breadth metrics vs. single-agent Claude Opus 4 | +90.2 % |
Token consumption vs. single agent | roughly 15x |
Context window per worker | approx. 200,000 tokens |
Lead model / worker model | Claude Opus 4 / Claude Sonnet 4 |
Anthropic was explicit: this token expenditure pays off only for high-value, parallelisable, breadth-oriented research tasks. For tightly coupled coding workflows, Anthropic expressly does not recommend the pattern (all figures as of June 2025).
Roles in the orchestrator-worker pattern
Role | Task |
|---|---|
Orchestrator (lead agent) | Interpret the request, break it down into independent subtasks, dynamically spawn workers, brief delegations precisely |
Worker agent (sub-agent) | Solve a clearly defined subtask in its own context window, use tools autonomously, deliver a compressed result |
Synthesis (by the orchestrator) | Reconcile worker results, identify gaps, produce the final answer as the sole writer |
Verifier-judge (optional) | Evaluate trajectories against a rubric (task fulfilled, answer well-founded, within budget) as an eval signal |
When the pattern fits, and when it does not
It fits breadth-oriented, parallelisable problems whose subtasks are independent: research, market intelligence, multi-document review, broad triage, finding comparable documents. These tasks have low state coupling between the subproblems, and the write path can trivially be kept single-threaded.
It does not fit tightly coupled writing tasks (code generation across multiple files, complex transaction workflows) or high-determinism workflows with a reproducibility obligation. There, a deterministic pipeline is the better choice.
The pattern selection matrix of the underlying research source classifies orchestrator-worker as follows: high parallelisability, low to medium tolerated state coupling, token cost factor 5x to 15x, medium latency tolerance, medium evaluability (requires trace tooling), production-ready today at Anthropic, AWS Bedrock and Agentforce (as of 2026).
Pros and cons, costs
Advantages: a dramatic quality gain on breadth tasks, genuine parallelism through isolated worker contexts, avoidance of context fragmentation through single-threaded synthesis, production-ready across several stacks.
Disadvantages and costs: token consumption is the central objection, roughly 15x in the Anthropic case. On top of this come typical sources of error:
- Vague delegations. Imprecise task briefings led to duplicated work and coverage gaps in Anthropic's early experiments. The orchestrator must explicitly learn how to delegate.
- Orchestrator drift. Over long runs the lead loses track of its plan and re-spawns workers for tasks already completed.
- Prompt-injection amplification. Every new worker context window is a new attack surface. If workers ingest untrusted content, the injection risk scales with the number of agents.
The practical counter-recipe from the research source for teams with a limited budget:
- Cap the worker token budget explicitly (Bedrock AgentCore, LangGraph and Claude Agent SDK support this).
- Force worker outputs into a typed schema (Pydantic, JSON schema or A2A artefact).
- Compress before returning, never pass a full worker transcript back to the orchestrator.
- Brief the orchestrator precisely: each worker receives an objective, output format, tool and source pointers, and clear task boundaries.
Worked example with figures
Suppose a competitive analysis across 30 market players. A single agent consumes an estimated 200,000 tokens for this in a sequential run. In the orchestrator-worker set-up, the lead breaks the task down into ten parallel workers, each handling three competitors. At a token factor of roughly 15x, total consumption comes to around 3 million tokens. The quality gain on breadth coverage documented by Anthropic is roughly 90 percent. The decision rule is thus clear: does the 15x token price pay off for the almost doubled coverage quality? For a one-off, business-critical market study, yes; for a daily routine report, almost always no. (The token values in the example are illustrative estimates; what is substantiated is the 15x factor and the breadth gain from Anthropic's research system.)
A production DACH proof point for a closely related, hierarchical set-up is Allianz Project Nemo: seven specialised agents (Planner, Cyber, Coverage, Weather, Fraud, Payout, Audit) for processing claims arising from spoiled food after natural disasters. The entire workflow runs in under five minutes; a human case handler reviews the audit summary and decides (human-in-the-loop as explicit policy). Documented result: 80 percent shorter processing and settlement time for eligible cases under 500 AUD, launched in Australia in July 2025 and rolled out in under 100 days.
For agencies and B2B decision-makers
For marketing agencies, orchestrator-worker is the right pattern wherever breadth matters: parallel research on many competitors, content audits across large sets of pages, evaluation of many sources for a single report. Deliberately do not use it for the final copywriting of several agents in parallel; here, single-threaded writing applies. For B2B decision-makers, the steering-committee-ready rule of thumb is: use multi-agent in the orchestrator-worker style when the task is parallelisable, has low coupling between the subtasks and keeps the write path single-threaded. Rational defaults for DACH projects are n8n or LangGraph for orchestration, MCP for tools and A2A for cross-platform handoffs. Anyone who wants to understand and implement the most robust multi-agent pattern cleanly should start with clear task decomposition and hard token caps, not with the number of agents.
FAQ
What distinguishes orchestrator-worker from a simple prompt chain?
Why is orchestrator-worker regarded as the most robust multi-agent pattern?
When should you NOT use the pattern?
How high are the costs compared with a single agent?
Which frameworks support orchestrator-worker in production?
Want to go deeper?
Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.