Skip to content
2.6Intermediate7 min

Plan-and-Execute: Separating Planning from Execution

Blck Alpaca·
Definition

Plan-and-Execute is an agent architecture in which a Planner first creates a complete multi-step plan and an Executor works through it step by step. A Replanner adjusts the plan when needed. Separating planning from execution reduces LLM calls and improves control over long-horizon tasks compared with pure ReAct.

Key Takeaways

  • Plan-and-Execute separates the heavy reasoning task (planning) from the cheap step-by-step execution - this enables model tiering: a strong model as Planner, an inexpensive model as Executor.
  • Compared with pure ReAct, the architecture saves on the order of 30-60 percent in tokens on multi-tool tasks according to LangChain field reports, because the Planner is not consulted again after every action.
  • A Replanner decides after each step: terminate or adjust the remaining plan - this provides robustness on long-horizon tasks that ReAct loses through reasoning drift.
  • The architecture was popularised by LangChain as an agent port of Plan-and-Solve prompting (Wang et al. 2023, arXiv:2305.04091), inspired by BabyAGI.
  • Weaknesses: brittle plans waste calls on hopeless steps, no true parallelism (steps run sequentially), and frequent replanning eats up the cost advantage.
  • Use it when tasks can be decomposed into more than three independent steps, when plan validity can be clearly verified, and when latency is not directly user-visible - not in highly stochastic environments.

Plan-and-Execute is an agent architecture in which a Planner first creates a complete multi-step plan and an Executor works through it step by step. A Replanner adjusts the plan when needed. Separating planning from execution reduces the number of LLM calls and improves control over long-horizon tasks compared with pure ReAct.

The quick answers up front:

  • What it solves: ReAct's step-by-step reasoning lacks a global overview - on long tasks the agent forgets the original goal. Plan-and-Execute enforces the goal through an explicit plan created in advance.
  • Why it is cheaper: The Planner is not consulted again after every action. This enables model tiering (a large model plans, a cheap one executes) and, on multi-tool tasks, saves on the order of 30-60 percent in tokens according to LangChain field reports.
  • The central trade-off: Better control and auditable plans are weighed against plan brittleness and the lack of true parallelism - in highly stochastic environments, ReAct is strictly superior.

Origins: Plan-and-Solve, BabyAGI and the LangChain port

A precise classification matters here, because the attribution is often abbreviated. The underlying prompting pattern comes from Plan-and-Solve Prompting (Wang et al., arXiv:2305.04091, ACL 2023). The paper proposed the two-stage zero-shot prompt - in essence "Let us first devise a plan / Let us carry out the plan" - and thereby beat zero-shot CoT on mathematical reasoning.

The agent name "Plan-and-Execute", however, comes from the LangChain port, not from the original paper. The architecture was additionally inspired by BabyAGI (Yohei Nakajima, 2023). Phrased correctly, it is therefore: the Plan-and-Execute agent architecture popularised by LangChain, based on Plan-and-Solve prompting.

The three components

The LangGraph variant consists of three nodes:

  1. Planner - generates a numbered, multi-step plan once (Plan: List[str]).
  2. Executor - typically a ReAct sub-agent that works through one plan step at a time.
  3. Replanner - decides after each execution: either return a final Response (terminate) or output a new, possibly revised, plan for the remaining work.

The flow:

Input → Planner → [Plan: Step_1, Step_2, …] → Executor(Step_1) → Replanner → (Executor(Step_2) → Replanner → …) → Response

The central innovation: Plan-and-Execute decouples planning (a heavy reasoning task for which a large model is worthwhile) from execution (per-step tool use, for which a smaller, cheaper model suffices).

Advantages over pure ReAct

  • Fewer planning calls, faster multi-step execution: The Planner is not queried after every action.
  • Cost savings through model tiering: Planner in GPT-4 class, Executor in a cheaper model. Empirically, the LangChain blog cites a token reduction on the order of 30-60 percent compared with pure ReAct on multi-tool tasks.
  • Explicit, auditable plans: A strong argument for enterprise and compliance applications - the plan is laid out before execution.
  • Global overview: The overall goal is preserved across long trajectories, rather than being lost through ReAct's "reasoning drift".

Weaknesses and failure modes

Plan-and-Execute is no panacea. The key limitations:

  • Plan brittleness: If the up-front plan is wrong, the Executor wastes calls on hopeless steps until the Replanner notices. The Plan-and-Solve paper itself documents calculation errors, missing steps and semantic misunderstandings.
  • No true parallelism: Plans are lists; the Executor works through them sequentially. This is exactly what LLMCompiler (Kim et al., arXiv:2312.04511) addresses by emitting a DAG instead of a list.
  • Replan costs: Each replan calls the large model again. In noisy environments, the Replanner becomes active at nearly every step and erodes the cost advantage.
  • Model choice is decisive: Smaller Executor models (e.g. gpt-4o-mini) trigger "frequent replanning" according to the LangChain tutorial.

ReAct vs. Plan-and-Execute in direct comparison

Dimension

ReAct

Plan-and-Execute

Planning

Implicit, step by step (reactive)

Explicit, complete plan up front

LLM calls

One call per step

One plan call + cheap execution, replan only when needed

Token effort (rel. to CoT = 1)

approx. 3-10x

approx. 2-6x

Latency (N tool steps)

N x sequential

1 plan + N sequential executions

Model tiering

Difficult (one loop, one model)

Natural (Planner large, Executor small)

Robustness on long-horizon goals

Weak (reasoning drift)

Strong (goal stays fixed in the plan)

Robustness under stochasticity

Strong (reactive)

Weak (plan can become stale)

Auditability

Trace per step

Explicit plan up front - compliance-friendly

Implementation complexity

Low (a one-liner in LangGraph)

Medium

The token figures should be understood as orders of magnitude, synthesised from paper and field reports - not direct measurements. Measure on your own workload.

Pseudocode: the Plan-and-Execute loop

Simplified, modelled on the LangGraph state-schema logic:

```

State: input, plan (list), past_steps (completed), response

def planner(state):
# a large model creates the complete plan ONCE
state.plan = big_model.plan(state.input) # e.g. ["1. ...", "2. ...", "3. ..."]
return state

def executor(state):
step = state.plan[0] # next open step
result = small_model.run_react(step) # cheap ReAct sub-agent + tools
state.past_steps.append((step, result))
return state

def replanner(state):
# either done -> Response, or output a new remaining plan
if big_model.is_done(state):
state.response = big_model.final_answer(state)
return state, "END"
state.plan = big_model.replan(state.input, state.past_steps)
return state, "executor"

Graph: planner -> executor -> replanner -> (executor | END)

```

The decisive point: big_model (expensive) runs only in planner and replanner, while small_model (cheap) carries the actual execution load.

A concrete example: daily marketing report

An agency builds an agent that produces a competitive report every morning. Task: "Research the three most important competitors, gather their new content from the last 24 hours, summarise it and send it by email."

With pure ReAct, the agent would call the full model again after every single action (search, scrape, summarise), including the entire context window so far - so with 12 steps, that is 12 expensive calls plus a growing prefix.

With Plan-and-Execute, the Planner creates the plan once (1 expensive call):

```
Plan:

  1. Identify competitors A, B, C
  2. For each: scrape new blog/social content from the last 24h
  3. Summarise briefly per competitor
  4. Format the overall report
  5. Send by email to the team
    ```

Steps 2-5 then run through a cheap Executor. The Replanner only intervenes if, for example, a source is unreachable. For this kind of decomposable batch task, the token savings fall within the stated range of around 30-60 percent - and the plan is documented in a way the agency can follow.

Use in frameworks (as of 2026)

  • LangGraph: Native, with an official tutorial. Canonical state schema with input, plan: List[str], past_steps and response; nodes planner, agent (Executor) and replan. A conditional edge should_end leads back to the Executor or to END.
  • CrewAI: A direct fit via Crew(planning=True). This activates an AgentPlanner that creates a step-by-step plan before each crew iteration and injects it into every task description. With Process.hierarchical plus manager_agent/manager_llm, a managing replanning layer is added.
  • AutoGen / Microsoft Agent Framework: Mapped via Group Chat with a dedicated Planner agent plus worker agents (RoundRobinGroupChat or SelectorGroupChat). In the MS Agent Framework, the SPAR cycle (Sense → Plan → Act → Reflect) is the productised variant. As of 2026, AutoGen is in maintenance mode; Microsoft points new projects to the MS Agent Framework.
  • n8n: No native Plan-and-Execute node, but realistic with two AI Agent nodes (Planner + Executor) connected via a SplitInBatches loop over the plan items. Limitation: n8n loops are not stateful across executions without external storage - suitable for batch jobs (daily reports, content pipelines), not for high-frequency real-time agents.

When to use it - and when not to

Use Plan-and-Execute when (a) tasks can be decomposed into more than three independent steps, (b) you have a clear authority for verifying plan validity, and (c) latency is not directly user-visible.

Avoid it when the environment is highly stochastic - if every observation can invalidate the plan (for example interactive web navigation), ReAct's reactive loop is strictly better. For compliance-critical DACH workflows (GDPR, EU AI Act), Plan-and-Execute with a human in the loop is a good fit: an auditable plan and deterministic execution.

For agencies and B2B decision-makers

Plan-and-Execute is the natural pattern for plannable, recurring multi-step processes of the kind that are everyday occurrences in agencies: reporting pipelines, research workflows, content production. The dual gain - measurable cost reduction through model tiering and an up-front, verifiable plan - hits exactly the requirements of compliance and budget control in the DACH B2B environment. The pragmatic advice from production experience applies here too, however: start with the simplest pattern that works (usually ReAct) and only escalate to Plan-and-Execute once measured failure modes - reasoning drift, cost, lack of overview - justify it. If you are evaluating which agent architecture fits your specific use case, Blck Alpaca can support you with selection, model tiering and auditable implementation.

FAQ

What is the difference between Plan-and-Execute and ReAct?
ReAct interleaves thinking, acting and observing in a reactive loop and decides anew after each action - this costs one LLM call per step and loses the original goal on long tasks (reasoning drift). Plan-and-Execute creates the complete plan once up front and then executes it, which means fewer planning calls and explicitly holds onto the overall goal. ReAct is better in highly stochastic environments, Plan-and-Execute in decomposable long-horizon tasks.
What is the Replanner responsible for in Plan-and-Execute?
The Replanner is invoked after each executed step and makes one of two decisions: either return a final answer and thereby terminate, or output a new, possibly revised, plan for the remaining work. This allows the architecture to respond to unexpected intermediate results. The price: each replan typically calls the large model again - in noisy environments where almost every step triggers a replan, this erodes the cost advantage.
How much cheaper is Plan-and-Execute than ReAct?
According to LangChain's field reports on the Plan-and-Execute agent, the token reduction on multi-tool tasks is empirically on the order of 30-60 percent compared with pure ReAct. These are indicative values, not guaranteed figures - they depend heavily on model choice and task. The biggest lever is model tiering: a GPT-4-class model for planning, a cheaper model (e.g. GPT-4o-mini or Haiku) for execution. Always measure on your own workload.
When should you NOT use Plan-and-Execute?
In highly stochastic environments where almost every observation invalidates the plan - for example interactive web navigation. There, ReAct's reactive loop is strictly superior. It is equally unsuitable for latency-critical, user-visible UX (NVIDIA's ACE agent example cites high latency as a real problem in speech pipelines) as well as for small Executor models, which according to the LangChain tutorial trigger frequent replanning.
Is Plan-and-Execute the same as Anthropic's orchestrator-workers pattern?
Very close in substance. In Building Effective Agents (Dec 2024), Anthropic describes the orchestrator-workers pattern, in which a coordinating LLM dynamically decomposes a task and delegates it to workers - this is, at its core, Plan-and-Execute reframed, and is used in Claude's coding agents. A common hybrid form is Plan-and-Execute with ReAct sub-agents as workers: the orchestrator plans, the ReAct agents execute.

Want to go deeper?

Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.