Hierarchical Agents: Supervisor and Sub-Agents
Hierarchical agents are a multi-agent architecture in which a supervisor agent decomposes a complex task, delegates subtasks to specialised sub-agents and merges their results. Instead of a single agent, a higher-level control instance coordinates several subordinate workers and aggregates their output into an overall solution.
Key Takeaways
- ✓A supervisor agent (also orchestrator or manager) plans, decomposes and delegates; specialised sub-agents execute subtasks and return results for aggregation.
- ✓Hierarchy scales where tasks are clearly decomposable and parallelisable, the read share is large and the write/state share is small.
- ✓The main advantages are specialisation and modularity; the central price is coordination overhead from additional LLM calls, context hand-off and potentially contradictory decisions.
- ✓Cognition recommends (as of 2026) the "read-parallel, write-single-threaded" pattern: sub-agents only for parallel information gathering, write operations centralised at the supervisor.
- ✓Anthropic's "Orchestrator-Workers" pattern and CrewAI's Hierarchical Process with manager_agent are the established productive implementations (as of 2026).
- ✓Before escalating to hierarchy, first measure the simplest pattern (usually a single ReAct agent) — multi-agent only when measurable failure modes force it.
Hierarchical agents are a multi-agent architecture in which a supervisor agent decomposes a complex task, delegates subtasks to specialised sub-agents and merges their results. Instead of a single agent that does everything itself, a higher-level control instance coordinates several subordinate workers and aggregates their output into an overall solution. The pattern corresponds to what Anthropic calls "Orchestrator-Workers" and CrewAI implements as the "Hierarchical Process".
- Supervisor: plans, decomposes, delegates, aggregates — does no domain-level detail work itself.
- Sub-agents: narrowly specialised workers with their own prompt and their own tools, each solving one subtask.
- Scaling: worthwhile when tasks are clearly decomposable and parallelisable — otherwise a single agent is cheaper.
How a hierarchical architecture works
At its core, hierarchy is an extension of the orchestrator-worker logic, in which a coordinating LLM dynamically decomposes a task into subtasks and distributes these to subordinate units. Anthropic describes the corresponding pattern in "Building Effective Agents" (December 2024) under the name "Orchestrator-Workers" as one of five canonical workflow patterns and assigns it to the same family as Plan-and-Execute: a coordinating LLM decomposes and delegates dynamically. The pattern is found, among other places, in Claude's coding agents.
The process follows a recurring loop:
- The supervisor receives the task and creates a plan, or a decomposition into subtasks.
- It selects a suitable sub-agent for each subtask and delegates with the necessary context.
- The sub-agents execute their subtask — typically as independent ReAct agents with their own tools — and report a partial result back.
- The supervisor aggregates the partial results, checks for completeness and decides: further delegation or final answer.
The separation of roles is important. The supervisor makes planning and coordination decisions; it does not perform the domain-level detail work itself. It is precisely this decoupling that enables a cost-efficient model choice — an insight that stems directly from the Plan-and-Execute pattern: a large model for the difficult planning task, smaller and cheaper models for the per-step execution in the sub-agents.
Diagram in words
Picture a tree with three levels. At the root sits the user request. One level down stands the supervisor node, which splits the request into several branches. Each branch ends in a sub-agent node — for example "researcher", "data analyst", "copywriter" — which in turn runs through its own small ReAct loop (Thought → Action → Observation) with its tools. The results flow back along the branches to the supervisor, which merges them into an answer at the root. This is exactly the picture also described by the research pattern ReAcTree (Choi et al., arXiv:2511.02424, AAMAS 2026), which extends ReAct into a hierarchical agent tree and achieves around +30 percentage points over plain ReAct on the WAH-NL benchmark.
When hierarchy scales
Hierarchy is not an end in itself. It pays off when the task structure allows for it:
- Decomposability: The task breaks down into more than three largely independent subtasks.
- Specialisation: The subtasks require different domain knowledge or different tools.
- Long horizon: A single agent would lose the original goal from context over many steps (in practice, the limit of a ReAct loop is often 10 to 25 steps before context loss or "reasoning drift" dominates).
- Parallelism: Several subtasks can be carried out simultaneously.
The most important productive guideline comes from Cognition (the team behind Devin). In "Don't Build Multi-Agents" (June 2025), the team warned that parallel agents make implicitly contradictory decisions and the result becomes fragile; Devin deliberately started single-threaded with read-only sub-agents only. In "Multi-Agents: What's Actually Working" (April 2026), the position was refined: multi-agent is viable for read-parallel, write-single-threaded setups. Concretely, Devin today uses a manager-Devin that spawns child-Devins via an internal MCP. Translated for practice: a strong agent with tools as the default, parallel sub-agents only for information gathering — never for write operations or state changes.
Advantages and risks
Dimension | Advantage | Risk / cost |
|---|---|---|
Specialisation | Each sub-agent has a lean prompt, clear tools, high hit rate | Incorrect task decomposition leads sub-agents nowhere |
Modularity | Sub-agents are individually testable, replaceable, reusable | More moving parts, higher architectural complexity |
Cost | Cheap model for workers, expensive only for the supervisor (40–70% savings with model tiering, as of 2026) | Every delegation and aggregation costs additional LLM calls |
Latency | Parallel sub-agents shorten the total duration | Sequential delegation adds coordination latency |
Robustness | Responsibility concentrated at the supervisor | Context loss on hand-off; contradictory assumptions of parallel workers |
Auditability | Clear, traceable delegation and planning trails (important for GDPR / EU AI Act) | Full trace persistence across all sub-agents required |
The dominant drawback is coordination overhead. Each additional agent means additional LLM calls for planning, delegation and aggregation, as well as the risk that information is lost when context is passed on. This is why the overarching rule of thumb from Anthropic's essay and Cognition's blog applies: start with the simplest pattern that works (usually a single ReAct agent), and only escalate to planning, reflection or hierarchy once measured failure modes force it. Unnecessary framework abstraction is, according to Anthropic, the central anti-pattern.
Distinction from flat multi-agent systems
In a flat multi-agent system, peer agents communicate directly with each other — there is no central control instance that delegates and aggregates. This is flexible, but it is precisely here that the problem described by Cognition lies: without a coordinating instance, the agents easily make implicitly contradictory decisions because none of them holds the overall context.
In the hierarchical system, the supervisor concentrates responsibility. It is the only actor with an overall view, assigns clear subtasks and integrates the results. This reduces the risk of contradictory decisions — at the cost of a central bottleneck and additional coordination steps. For most DACH B2B scenarios with compliance requirements, the hierarchical variant is the more predictable one, because the delegation path remains auditable.
Frameworks and implementation (as of 2026)
- CrewAI: a direct hit.
Process.hierarchicalactivates a manager (manager_agentormanager_llm) that delegates tasks to the crew and adds a managerial replanning layer. WithCrew(planning=True), anAgentPlanneradditionally generates a step-by-step plan before each iteration. CrewAI is widely used in the DACH marketing automation space. - LangGraph: as a state graph with one supervisor node and several ReAct sub-agents (
create_react_agent, increasinglylangchain.agents.create_agentwith middleware in the 2026 API). The Plan-and-Execute and ReWOO tutorials provide the building blocks for planning and delegated execution. - Microsoft Agent Framework: Group Chat with a dedicated planner agent and worker agents; the productionised SPAR cycle (Sense → Plan → Act → Reflect) combines ReAct and reflection. AutoGen is in maintenance mode; new projects are directed to the Microsoft Agent Framework.
- n8n: realistic for orchestrated batch workflows (planner agent plus workers exposed as tools via sub-workflows), but without stateful loops across executions only with external memory.
Practical example: competitive analysis as an agency workflow
An agency is to produce a competitive analysis for a B2B client. A single agent would lose the goal over many tool calls. Hierarchically, the process looks like this (pseudocode):
```
Supervisor.plan("Competitive analysis for client X"):
-> Subtask 1: research_agent("Top 5 competitors + positioning")
-> Subtask 2: pricing_agent("capture public pricing models")
-> Subtask 3: seo_agent("organic visibility of competitors")
# 1-3 run in parallel (read-only)
results = gather(1, 2, 3) # parallel read sub-agents
Supervisor.aggregate(results) # single writing step
-> writer_agent("management summary from the evidence") # single-threaded
```
Three read sub-agents work in parallel and deliver evidence; the supervisor merges them, and exactly one writing step (drafting the summary) remains single-threaded. The workflow thus follows precisely the "read-parallel, write-single-threaded" principle recommended by Cognition. Through model tiering — a large model for the supervisor and copywriter, a cheaper one for the read workers — savings of around 40–70% across multi-stage workflows are realistic compared with a uniformly expensive variant (order of magnitude, as of 2026; measure against your own load).
For agencies and B2B
Hierarchical agents are not a default but a deliberate escalation. For agencies this means: decomposable, recurring deliverables — research briefings, reporting, content pipelines — lend themselves to a supervisor with specialised read sub-agents, while the final, state-changing step stays centralised. For DACH B2B decision-makers, the auditability of the delegation path (full trace persistence with PII redaction) is a concrete added value with regard to GDPR and the EU AI Act. Anyone wanting to evaluate a hierarchical agent workflow for marketing or operations should start with a measurable individual case, keep the simplest working pattern as a baseline, and only switch to hierarchy once benefit has been demonstrated. Blck Alpaca supports this architecture decision from task decomposition through to productive, compliance-compliant implementation.
FAQ
What is the difference between a supervisor agent and a sub-agent?
When is a hierarchical architecture worthwhile instead of a single agent?
How do hierarchical multi-agent systems differ from flat ones?
What is the biggest risk of hierarchical agents?
Which frameworks support hierarchical agents in production?
Want to go deeper?
Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.