Plan-and-Execute: Separating Planning from Execution
Plan-and-Execute is an agent architecture in which a Planner first creates a complete multi-step plan and an Executor works through it step by step. A Replanner adjusts the plan when needed. Separating planning from execution reduces LLM calls and improves control over long-horizon tasks compared with pure ReAct.
Key Takeaways
- ✓Plan-and-Execute separates the heavy reasoning task (planning) from the cheap step-by-step execution - this enables model tiering: a strong model as Planner, an inexpensive model as Executor.
- ✓Compared with pure ReAct, the architecture saves on the order of 30-60 percent in tokens on multi-tool tasks according to LangChain field reports, because the Planner is not consulted again after every action.
- ✓A Replanner decides after each step: terminate or adjust the remaining plan - this provides robustness on long-horizon tasks that ReAct loses through reasoning drift.
- ✓The architecture was popularised by LangChain as an agent port of Plan-and-Solve prompting (Wang et al. 2023, arXiv:2305.04091), inspired by BabyAGI.
- ✓Weaknesses: brittle plans waste calls on hopeless steps, no true parallelism (steps run sequentially), and frequent replanning eats up the cost advantage.
- ✓Use it when tasks can be decomposed into more than three independent steps, when plan validity can be clearly verified, and when latency is not directly user-visible - not in highly stochastic environments.
Plan-and-Execute is an agent architecture in which a Planner first creates a complete multi-step plan and an Executor works through it step by step. A Replanner adjusts the plan when needed. Separating planning from execution reduces the number of LLM calls and improves control over long-horizon tasks compared with pure ReAct.
The quick answers up front:
- What it solves: ReAct's step-by-step reasoning lacks a global overview - on long tasks the agent forgets the original goal. Plan-and-Execute enforces the goal through an explicit plan created in advance.
- Why it is cheaper: The Planner is not consulted again after every action. This enables model tiering (a large model plans, a cheap one executes) and, on multi-tool tasks, saves on the order of 30-60 percent in tokens according to LangChain field reports.
- The central trade-off: Better control and auditable plans are weighed against plan brittleness and the lack of true parallelism - in highly stochastic environments, ReAct is strictly superior.
Origins: Plan-and-Solve, BabyAGI and the LangChain port
A precise classification matters here, because the attribution is often abbreviated. The underlying prompting pattern comes from Plan-and-Solve Prompting (Wang et al., arXiv:2305.04091, ACL 2023). The paper proposed the two-stage zero-shot prompt - in essence "Let us first devise a plan / Let us carry out the plan" - and thereby beat zero-shot CoT on mathematical reasoning.
The agent name "Plan-and-Execute", however, comes from the LangChain port, not from the original paper. The architecture was additionally inspired by BabyAGI (Yohei Nakajima, 2023). Phrased correctly, it is therefore: the Plan-and-Execute agent architecture popularised by LangChain, based on Plan-and-Solve prompting.
The three components
The LangGraph variant consists of three nodes:
- Planner - generates a numbered, multi-step plan once (
Plan: List[str]). - Executor - typically a ReAct sub-agent that works through one plan step at a time.
- Replanner - decides after each execution: either return a final
Response(terminate) or output a new, possibly revised, plan for the remaining work.
The flow:
Input → Planner → [Plan: Step_1, Step_2, …] → Executor(Step_1) → Replanner → (Executor(Step_2) → Replanner → …) → Response
The central innovation: Plan-and-Execute decouples planning (a heavy reasoning task for which a large model is worthwhile) from execution (per-step tool use, for which a smaller, cheaper model suffices).
Advantages over pure ReAct
- Fewer planning calls, faster multi-step execution: The Planner is not queried after every action.
- Cost savings through model tiering: Planner in GPT-4 class, Executor in a cheaper model. Empirically, the LangChain blog cites a token reduction on the order of 30-60 percent compared with pure ReAct on multi-tool tasks.
- Explicit, auditable plans: A strong argument for enterprise and compliance applications - the plan is laid out before execution.
- Global overview: The overall goal is preserved across long trajectories, rather than being lost through ReAct's "reasoning drift".
Weaknesses and failure modes
Plan-and-Execute is no panacea. The key limitations:
- Plan brittleness: If the up-front plan is wrong, the Executor wastes calls on hopeless steps until the Replanner notices. The Plan-and-Solve paper itself documents calculation errors, missing steps and semantic misunderstandings.
- No true parallelism: Plans are lists; the Executor works through them sequentially. This is exactly what LLMCompiler (Kim et al., arXiv:2312.04511) addresses by emitting a DAG instead of a list.
- Replan costs: Each replan calls the large model again. In noisy environments, the Replanner becomes active at nearly every step and erodes the cost advantage.
- Model choice is decisive: Smaller Executor models (e.g.
gpt-4o-mini) trigger "frequent replanning" according to the LangChain tutorial.
ReAct vs. Plan-and-Execute in direct comparison
Dimension | ReAct | Plan-and-Execute |
|---|---|---|
Planning | Implicit, step by step (reactive) | Explicit, complete plan up front |
LLM calls | One call per step | One plan call + cheap execution, replan only when needed |
Token effort (rel. to CoT = 1) | approx. 3-10x | approx. 2-6x |
Latency (N tool steps) | N x sequential | 1 plan + N sequential executions |
Model tiering | Difficult (one loop, one model) | Natural (Planner large, Executor small) |
Robustness on long-horizon goals | Weak (reasoning drift) | Strong (goal stays fixed in the plan) |
Robustness under stochasticity | Strong (reactive) | Weak (plan can become stale) |
Auditability | Trace per step | Explicit plan up front - compliance-friendly |
Implementation complexity | Low (a one-liner in LangGraph) | Medium |
The token figures should be understood as orders of magnitude, synthesised from paper and field reports - not direct measurements. Measure on your own workload.
Pseudocode: the Plan-and-Execute loop
Simplified, modelled on the LangGraph state-schema logic:
```
State: input, plan (list), past_steps (completed), response
def planner(state):
# a large model creates the complete plan ONCE
state.plan = big_model.plan(state.input) # e.g. ["1. ...", "2. ...", "3. ..."]
return state
def executor(state):
step = state.plan[0] # next open step
result = small_model.run_react(step) # cheap ReAct sub-agent + tools
state.past_steps.append((step, result))
return state
def replanner(state):
# either done -> Response, or output a new remaining plan
if big_model.is_done(state):
state.response = big_model.final_answer(state)
return state, "END"
state.plan = big_model.replan(state.input, state.past_steps)
return state, "executor"
Graph: planner -> executor -> replanner -> (executor | END)
```
The decisive point: big_model (expensive) runs only in planner and replanner, while small_model (cheap) carries the actual execution load.
A concrete example: daily marketing report
An agency builds an agent that produces a competitive report every morning. Task: "Research the three most important competitors, gather their new content from the last 24 hours, summarise it and send it by email."
With pure ReAct, the agent would call the full model again after every single action (search, scrape, summarise), including the entire context window so far - so with 12 steps, that is 12 expensive calls plus a growing prefix.
With Plan-and-Execute, the Planner creates the plan once (1 expensive call):
```
Plan:
- Identify competitors A, B, C
- For each: scrape new blog/social content from the last 24h
- Summarise briefly per competitor
- Format the overall report
- Send by email to the team
```
Steps 2-5 then run through a cheap Executor. The Replanner only intervenes if, for example, a source is unreachable. For this kind of decomposable batch task, the token savings fall within the stated range of around 30-60 percent - and the plan is documented in a way the agency can follow.
Use in frameworks (as of 2026)
- LangGraph: Native, with an official tutorial. Canonical state schema with
input,plan: List[str],past_stepsandresponse; nodesplanner,agent(Executor) andreplan. A conditional edgeshould_endleads back to the Executor or toEND. - CrewAI: A direct fit via
Crew(planning=True). This activates anAgentPlannerthat creates a step-by-step plan before each crew iteration and injects it into every task description. WithProcess.hierarchicalplusmanager_agent/manager_llm, a managing replanning layer is added. - AutoGen / Microsoft Agent Framework: Mapped via Group Chat with a dedicated
Planneragent plus worker agents (RoundRobinGroupChatorSelectorGroupChat). In the MS Agent Framework, the SPAR cycle (Sense → Plan → Act → Reflect) is the productised variant. As of 2026, AutoGen is in maintenance mode; Microsoft points new projects to the MS Agent Framework. - n8n: No native Plan-and-Execute node, but realistic with two AI Agent nodes (Planner + Executor) connected via a SplitInBatches loop over the plan items. Limitation: n8n loops are not stateful across executions without external storage - suitable for batch jobs (daily reports, content pipelines), not for high-frequency real-time agents.
When to use it - and when not to
Use Plan-and-Execute when (a) tasks can be decomposed into more than three independent steps, (b) you have a clear authority for verifying plan validity, and (c) latency is not directly user-visible.
Avoid it when the environment is highly stochastic - if every observation can invalidate the plan (for example interactive web navigation), ReAct's reactive loop is strictly better. For compliance-critical DACH workflows (GDPR, EU AI Act), Plan-and-Execute with a human in the loop is a good fit: an auditable plan and deterministic execution.
For agencies and B2B decision-makers
Plan-and-Execute is the natural pattern for plannable, recurring multi-step processes of the kind that are everyday occurrences in agencies: reporting pipelines, research workflows, content production. The dual gain - measurable cost reduction through model tiering and an up-front, verifiable plan - hits exactly the requirements of compliance and budget control in the DACH B2B environment. The pragmatic advice from production experience applies here too, however: start with the simplest pattern that works (usually ReAct) and only escalate to Plan-and-Execute once measured failure modes - reasoning drift, cost, lack of overview - justify it. If you are evaluating which agent architecture fits your specific use case, Blck Alpaca can support you with selection, model tiering and auditable implementation.
FAQ
What is the difference between Plan-and-Execute and ReAct?
What is the Replanner responsible for in Plan-and-Execute?
How much cheaper is Plan-and-Execute than ReAct?
When should you NOT use Plan-and-Execute?
Is Plan-and-Execute the same as Anthropic's orchestrator-workers pattern?
Want to go deeper?
Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.