2.2Intermediate7 min

The ReAct Pattern: Thought, Action, Observation

Q: What does ReAct mean?

ReAct stands for Reasoning and Acting. The LLM interleaves free-text reasoning (Thought) with tool calls (Action) and reads their results back (Observation). These three steps form a loop that runs until the agent produces a final answer. The pattern was introduced by Yao et al. in 2022.

Q: How does ReAct differ from Chain-of-Thought (CoT)?

Chain-of-Thought reasons purely internally, without access to the outside world, and can therefore hallucinate facts. ReAct grounds the reasoning through real tool observations: the reasoning steers the choice of tool, and the tool results in turn correct the reasoning. ReAct thus combines abstract reasoning with factual grounding.

Q: What are the most common failure modes of the ReAct pattern?

The most common problems are infinite loops without a hard step limit, reasoning drift (the agent clings to a false assumption and interprets later observations to fit it), hallucinated tool arguments arising from ambiguous tool descriptions, and context bloat, since every step resends the entire history so far.

Q: How do you prevent infinite loops in ReAct agents?

Through hard iteration limits: in LangGraph you set recursion_limit, in AutoGen and CrewAI max_iterations or max_reasoning_attempts respectively. LangGraph's create_react_agent provides a remaining_steps channel and aborts when too few steps remain. Every production agent should have an explicit limit.

Q: In which frameworks is ReAct available?

ReAct is natively available in LangGraph (create_react_agent), CrewAI (every agent is internally a ReAct agent), AutoGen or the Microsoft Agent Framework, as well as in n8n (ReAct AI Agent and the newer Tools Agent). As of 2026, LangChain's create_react_agent is moving from langgraph.prebuilt to langchain.agents.create_agent.

Q: Is explicit ReAct prompting still necessary with modern models?

Usually not. Current frontier models with native function calling handle the reasoning-action loop out of the box. According to practitioner reports, what matters then is no longer ReAct prompts but memory, iteration limits and traceability (tracing).

Blck Alpaca·9 June 2026

Definition

The ReAct pattern (Reasoning and Acting) is an agent design pattern in which an LLM alternates between reasoning (Thought), calling a tool (Action) and reading the result (Observation). This loop repeats until the agent produces a final answer. Introduced by Yao et al. (2022).

Key Takeaways

✓ReAct interleaves verbal reasoning (Thought) with tool calls (Action) and their results (Observation) within a single context - introduced by Yao et al. (arXiv:2210.03629, October 2022, ICLR 2023).
✓Its core strength is the combination of tool use and transparency: the complete Thought/Action/Observation trace is traceable and therefore auditable for audits and DACH compliance (GDPR, EU AI Act).
✓Typical failure modes are infinite loops, reasoning drift (clinging to a false assumption) and the hallucination of tool arguments - especially with weaker models.
✓The most important countermeasure: hard iteration limits (recursion_limit / max_iterations) and function calling instead of free-text JSON arguments.
✓The practical upper bound is usually around 10-25 steps before context loss and reasoning drift dominate; ReAct is sequential and therefore latency-bound.
✓Practical recommendation (Anthropic, Cognition): start with the simplest pattern - usually ReAct - and only escalate when measured failure modes force you to.

The ReAct pattern (Reasoning and Acting) is an agent design pattern in which an LLM alternates between reasoning (Thought), calling a tool (Action) and reading the result back (Observation). This loop repeats until the agent produces a final answer. It was introduced by Yao et al. (arXiv:2210.03629, October 2022, ICLR 2023). The reasoning steers the choice of tool, and the tool observations correct the reasoning.

What it is: A loop of Thought, Action and Observation in which an LLM interleaves thinking and acting and uses tools within the same context.
What it is good for: Tool-supported, factually grounded tasks with low latency and a fully traceable history - the standard starting point for agent projects.
What to watch out for: Infinite loops, reasoning drift and hallucinated tool arguments; mitigated with hard iteration limits and function calling.

Where ReAct comes from and which problem it solves

ReAct emerged as a response to two opposing weaknesses of earlier approaches. Pure Chain-of-Thought (CoT) reasons exclusively internally and therefore hallucinates facts, because it lacks any grounding in the outside world. Pure action agents - such as early WebGPT or SayCan approaches - conversely cannot reason abstractly about long-term goals or recover from exceptions.

ReAct unites both: the verbal reasoning sits in the same context as the actions. This gives the LLM a working memory across the entire trajectory - and makes the agent interpretable. Formally, ReAct expands the action space: alongside external actions that trigger real observations, there are linguistic "actions" (the Thoughts), which change nothing in the environment but only update the agent's internal context. Notably, the pattern already works with one to two in-context examples - without fine-tuning.

The loop in detail: Thought, Action, Observation

The canonical sequence is: User Query → [LLM: Thought] → [LLM: Action] → [Environment: Observation] → [LLM: Thought] → … → Finish[Answer]. The three steps interlock as follows.

Step	Meaning	Who produces it
Thought	Free-text reasoning: the agent considers what it needs to do next and why. Updates only the internal context, not the environment.	LLM
Action	A concrete tool call with arguments (e.g. search, API call, database query) or `Finish[Answer]` to terminate.	LLM
Observation	The result of the action returned by the environment (search hits, API response, error message). Feeds into the next Thought.	Environment / Tool

This cycle runs until the model emits Action: Finish[Answer]. The practical upper bound is usually around 10-25 steps before context loss or reasoning drift take over.

Pseudocode of a ReAct loop

```text
context = system_prompt + tool_descriptions + user_query
step = 0
MAX_STEPS = 15 # hard limit against infinite loops

while step < MAX_STEPS:
output = LLM(context) # produces Thought + Action

if output.action == "Finish":
return output.action.argument # final answer

observation = run_tool(output.action) # external call
context = context + output.thought

- - - - output.action
        observation # append observation
        step += 1

return "Limit reached - no solution found." # clean abort
```

The central point: in every iteration, the entire history so far is sent to the LLM again. This is the source of both the strength (complete memory) and the most important weakness (cost).

Strengths and limitations from the research

The original paper documents ReAct's advantages across several benchmarks - importantly, the figures stem from the conditions of 2022/2023 (GPT-3 and PaLM class) and should be read as relative values, not as today's absolute values.

ALFWorld (text-based household tasks): +34 absolute percentage points over imitation/RL baselines, with one to two examples.
WebShop (e-commerce navigation): +10 absolute percentage points in success rate over IL/IL+RL.
HotpotQA and Fever: competitive with or better than pure CoT or pure action generation; the strongest configuration is a hybrid that switches between internal knowledge and tool-supported reasoning.

On the cost side, ReAct is expensive and slow. The token cost grows with O(N · T): N steps times the cumulative context length - every step pays for the prefix (system prompt, tool descriptions, entire history) again. The latency is strictly sequential, i.e. roughly N × (LLM response time + tool response time). It is precisely this prefix repetition that is the weakness addressed by the successor pattern ReWOO through just two LLM calls.

Failure modes and countermeasures

Three failure modes occur particularly frequently in practice:

Infinite loops: Without a hard step limit, the agent can reason in circles indefinitely. Countermeasure: explicit limits - recursion_limit in LangGraph, max_iterations in AutoGen, max_reasoning_attempts in CrewAI. LangGraph's create_react_agent offers a remaining_steps channel and aborts with a notice message when fewer than two steps remain.
Reasoning drift / error propagation: Once the model has committed to a false Thought, it interprets subsequent observations so that they fit it. Countermeasure: shorter trajectories, intermediate validation and - for verifiable tasks - an external reflection/check signal (a bridge to the Reflexion pattern).
Hallucinated tool arguments and hallucinated observations: With ambiguous tool descriptions the model invents arguments; weaker models sometimes also "fabricate" observations instead of waiting for the real tool result. According to research, malformed JSON tool arguments are the most common production error on weaker models. Countermeasure: use structured output or function-calling mode whenever the provider supports it, and phrase tool descriptions precisely.

Added to this is the context bloat already mentioned: every step resends the complete history.

A concrete example: a research agent for a marketing agency

Suppose an agency builds an agent that researches the current headcount of a client company. The loop might run as follows:

Thought: "I do not know the current figure and need to look it up." → Action: Search["headcount Company X 2026"] → Observation: hit with a press release.
Thought: "The release states a range, I need the exact figure." → Action: Open[URL of the press release] → Observation: full text with "around 1,200 employees".
Thought: "That answers the question." → Action: Finish["approx. 1,200 employees, as of 2026"].

Three steps, three LLM calls, a fully logged history. If you set MAX_STEPS = 15, the agent is protected against infinite loops - if it finds nothing, it aborts cleanly instead of burning tokens. It is precisely this traceable sequence that is a real advantage for marketing teams without a developer background: in n8n, for example, the execution view shows every Thought/Action/Observation step as an auditable log.

Frameworks and practical recommendations (as of 2026)

ReAct is natively available in all common stacks: LangGraph (create_react_agent), CrewAI (every agent is internally a ReAct agent), AutoGen or the Microsoft Agent Framework (AutoGen is in maintenance mode as of 2026; Microsoft directs new projects to the Agent Framework) and n8n (ReAct AI Agent as well as the newer Tools Agent recommended for modern function-calling models). In LangChain, create_react_agent is moving as of 2026 from langgraph.prebuilt to langchain.agents.create_agent with middleware decorators.

The most important practical insight comes from Anthropic ("Building Effective Agents", December 2024) and Cognition: start with the simplest pattern - usually ReAct - and only escalate when measured failure modes force you to. Modern frontier models handle the reasoning-action loop natively via function calling; explicit ReAct prompting is often unnecessary. What matters then is memory, iteration limits and tracing. Observability tools (LangSmith, Arize Phoenix, Langfuse) are effectively considered mandatory - for DACH compliance (GDPR, EU AI Act) the complete trace must be persisted and PII-cleaned.

For agencies and B2B decision-makers

ReAct is the pragmatic entry point for almost any agent project: customer chatbots with CRM and knowledge-base integration, support triage or research tasks. It is cheap to start, low-latency and fully auditable - three properties that count in the DACH B2B environment. Set hard iteration limits from the outset, use function calling instead of free-text arguments and persist the trace for audit and debugging. If, as an agency, you want to design an agent or have an existing workflow checked for stability, get in touch - we support selection, architecture and compliant implementation.

FAQ

What does ReAct mean?

ReAct stands for Reasoning and Acting. The LLM interleaves free-text reasoning (Thought) with tool calls (Action) and reads their results back (Observation). These three steps form a loop that runs until the agent produces a final answer. The pattern was introduced by Yao et al. in 2022.

How does ReAct differ from Chain-of-Thought (CoT)?

Chain-of-Thought reasons purely internally, without access to the outside world, and can therefore hallucinate facts. ReAct grounds the reasoning through real tool observations: the reasoning steers the choice of tool, and the tool results in turn correct the reasoning. ReAct thus combines abstract reasoning with factual grounding.

What are the most common failure modes of the ReAct pattern?

The most common problems are infinite loops without a hard step limit, reasoning drift (the agent clings to a false assumption and interprets later observations to fit it), hallucinated tool arguments arising from ambiguous tool descriptions, and context bloat, since every step resends the entire history so far.

How do you prevent infinite loops in ReAct agents?

Through hard iteration limits: in LangGraph you set recursion_limit, in AutoGen and CrewAI max_iterations or max_reasoning_attempts respectively. LangGraph's create_react_agent provides a remaining_steps channel and aborts when too few steps remain. Every production agent should have an explicit limit.

In which frameworks is ReAct available?