The ReAct Pattern: Thought, Action, Observation
The ReAct pattern (Reasoning and Acting) is an agent design pattern in which an LLM alternates between reasoning (Thought), calling a tool (Action) and reading the result (Observation). This loop repeats until the agent produces a final answer. Introduced by Yao et al. (2022).
Key Takeaways
- ✓ReAct interleaves verbal reasoning (Thought) with tool calls (Action) and their results (Observation) within a single context - introduced by Yao et al. (arXiv:2210.03629, October 2022, ICLR 2023).
- ✓Its core strength is the combination of tool use and transparency: the complete Thought/Action/Observation trace is traceable and therefore auditable for audits and DACH compliance (GDPR, EU AI Act).
- ✓Typical failure modes are infinite loops, reasoning drift (clinging to a false assumption) and the hallucination of tool arguments - especially with weaker models.
- ✓The most important countermeasure: hard iteration limits (recursion_limit / max_iterations) and function calling instead of free-text JSON arguments.
- ✓The practical upper bound is usually around 10-25 steps before context loss and reasoning drift dominate; ReAct is sequential and therefore latency-bound.
- ✓Practical recommendation (Anthropic, Cognition): start with the simplest pattern - usually ReAct - and only escalate when measured failure modes force you to.
The ReAct pattern (Reasoning and Acting) is an agent design pattern in which an LLM alternates between reasoning (Thought), calling a tool (Action) and reading the result back (Observation). This loop repeats until the agent produces a final answer. It was introduced by Yao et al. (arXiv:2210.03629, October 2022, ICLR 2023). The reasoning steers the choice of tool, and the tool observations correct the reasoning.
- What it is: A loop of Thought, Action and Observation in which an LLM interleaves thinking and acting and uses tools within the same context.
- What it is good for: Tool-supported, factually grounded tasks with low latency and a fully traceable history - the standard starting point for agent projects.
- What to watch out for: Infinite loops, reasoning drift and hallucinated tool arguments; mitigated with hard iteration limits and function calling.
Where ReAct comes from and which problem it solves
ReAct emerged as a response to two opposing weaknesses of earlier approaches. Pure Chain-of-Thought (CoT) reasons exclusively internally and therefore hallucinates facts, because it lacks any grounding in the outside world. Pure action agents - such as early WebGPT or SayCan approaches - conversely cannot reason abstractly about long-term goals or recover from exceptions.
ReAct unites both: the verbal reasoning sits in the same context as the actions. This gives the LLM a working memory across the entire trajectory - and makes the agent interpretable. Formally, ReAct expands the action space: alongside external actions that trigger real observations, there are linguistic "actions" (the Thoughts), which change nothing in the environment but only update the agent's internal context. Notably, the pattern already works with one to two in-context examples - without fine-tuning.
The loop in detail: Thought, Action, Observation
The canonical sequence is: User Query → [LLM: Thought] → [LLM: Action] → [Environment: Observation] → [LLM: Thought] → … → Finish[Answer]. The three steps interlock as follows.
Step | Meaning | Who produces it |
|---|---|---|
Thought | Free-text reasoning: the agent considers what it needs to do next and why. Updates only the internal context, not the environment. | LLM |
Action | A concrete tool call with arguments (e.g. search, API call, database query) or | LLM |
Observation | The result of the action returned by the environment (search hits, API response, error message). Feeds into the next Thought. | Environment / Tool |
This cycle runs until the model emits Action: Finish[Answer]. The practical upper bound is usually around 10-25 steps before context loss or reasoning drift take over.
Pseudocode of a ReAct loop
```text
context = system_prompt + tool_descriptions + user_query
step = 0
MAX_STEPS = 15 # hard limit against infinite loops
while step < MAX_STEPS:
output = LLM(context) # produces Thought + Action
if output.action == "Finish":
return output.action.argument # final answer
observation = run_tool(output.action) # external call
context = context + output.thought
- output.action
- observation # append observation
step += 1
return "Limit reached - no solution found." # clean abort
```
The central point: in every iteration, the entire history so far is sent to the LLM again. This is the source of both the strength (complete memory) and the most important weakness (cost).
Strengths and limitations from the research
The original paper documents ReAct's advantages across several benchmarks - importantly, the figures stem from the conditions of 2022/2023 (GPT-3 and PaLM class) and should be read as relative values, not as today's absolute values.
- ALFWorld (text-based household tasks): +34 absolute percentage points over imitation/RL baselines, with one to two examples.
- WebShop (e-commerce navigation): +10 absolute percentage points in success rate over IL/IL+RL.
- HotpotQA and Fever: competitive with or better than pure CoT or pure action generation; the strongest configuration is a hybrid that switches between internal knowledge and tool-supported reasoning.
On the cost side, ReAct is expensive and slow. The token cost grows with O(N · T): N steps times the cumulative context length - every step pays for the prefix (system prompt, tool descriptions, entire history) again. The latency is strictly sequential, i.e. roughly N × (LLM response time + tool response time). It is precisely this prefix repetition that is the weakness addressed by the successor pattern ReWOO through just two LLM calls.
Failure modes and countermeasures
Three failure modes occur particularly frequently in practice:
- Infinite loops: Without a hard step limit, the agent can reason in circles indefinitely. Countermeasure: explicit limits -
recursion_limitin LangGraph,max_iterationsin AutoGen,max_reasoning_attemptsin CrewAI. LangGraph'screate_react_agentoffers aremaining_stepschannel and aborts with a notice message when fewer than two steps remain. - Reasoning drift / error propagation: Once the model has committed to a false Thought, it interprets subsequent observations so that they fit it. Countermeasure: shorter trajectories, intermediate validation and - for verifiable tasks - an external reflection/check signal (a bridge to the Reflexion pattern).
- Hallucinated tool arguments and hallucinated observations: With ambiguous tool descriptions the model invents arguments; weaker models sometimes also "fabricate" observations instead of waiting for the real tool result. According to research, malformed JSON tool arguments are the most common production error on weaker models. Countermeasure: use structured output or function-calling mode whenever the provider supports it, and phrase tool descriptions precisely.
Added to this is the context bloat already mentioned: every step resends the complete history.
A concrete example: a research agent for a marketing agency
Suppose an agency builds an agent that researches the current headcount of a client company. The loop might run as follows:
- Thought: "I do not know the current figure and need to look it up." → Action:
Search["headcount Company X 2026"]→ Observation: hit with a press release. - Thought: "The release states a range, I need the exact figure." → Action:
Open[URL of the press release]→ Observation: full text with "around 1,200 employees". - Thought: "That answers the question." → Action:
Finish["approx. 1,200 employees, as of 2026"].
Three steps, three LLM calls, a fully logged history. If you set MAX_STEPS = 15, the agent is protected against infinite loops - if it finds nothing, it aborts cleanly instead of burning tokens. It is precisely this traceable sequence that is a real advantage for marketing teams without a developer background: in n8n, for example, the execution view shows every Thought/Action/Observation step as an auditable log.
Frameworks and practical recommendations (as of 2026)
ReAct is natively available in all common stacks: LangGraph (create_react_agent), CrewAI (every agent is internally a ReAct agent), AutoGen or the Microsoft Agent Framework (AutoGen is in maintenance mode as of 2026; Microsoft directs new projects to the Agent Framework) and n8n (ReAct AI Agent as well as the newer Tools Agent recommended for modern function-calling models). In LangChain, create_react_agent is moving as of 2026 from langgraph.prebuilt to langchain.agents.create_agent with middleware decorators.
The most important practical insight comes from Anthropic ("Building Effective Agents", December 2024) and Cognition: start with the simplest pattern - usually ReAct - and only escalate when measured failure modes force you to. Modern frontier models handle the reasoning-action loop natively via function calling; explicit ReAct prompting is often unnecessary. What matters then is memory, iteration limits and tracing. Observability tools (LangSmith, Arize Phoenix, Langfuse) are effectively considered mandatory - for DACH compliance (GDPR, EU AI Act) the complete trace must be persisted and PII-cleaned.
For agencies and B2B decision-makers
ReAct is the pragmatic entry point for almost any agent project: customer chatbots with CRM and knowledge-base integration, support triage or research tasks. It is cheap to start, low-latency and fully auditable - three properties that count in the DACH B2B environment. Set hard iteration limits from the outset, use function calling instead of free-text arguments and persist the trace for audit and debugging. If, as an agency, you want to design an agent or have an existing workflow checked for stability, get in touch - we support selection, architecture and compliant implementation.
FAQ
What does ReAct mean?
How does ReAct differ from Chain-of-Thought (CoT)?
What are the most common failure modes of the ReAct pattern?
How do you prevent infinite loops in ReAct agents?
In which frameworks is ReAct available?
Is explicit ReAct prompting still necessary with modern models?
Want to go deeper?
Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.