ReAct (short for Reason + Act) is the agent pattern that interleaves chain-of-thought reasoning with external tool calls. The model writes a thought ("I need to know who won the 2024 election"), takes an action ("search('2024 US election winner')"), reads the observation, and loops. The pattern is simple to describe, hard to get right, and has been the conceptual foundation of basically every LLM agent built since 2023.
The paper came out of Princeton and Google in October 2022, predating ChatGPT. At the time, "agent" meant either a reinforcement-learning policy or a hand-coded planner. ReAct showed that a language model, with the right prompt structure, could behave like a reasoning agent: it could decide what to do next, do it, observe the result, and keep going. That insight unlocked LangChain, AutoGPT, BabyAGI, and the whole tool-using-LLM wave that followed.
In 2026, native tool calling is built into every frontier model. GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro all support multi-turn tool use with structured arguments out of the box. The textual ReAct prompt format is mostly historical curiosity now, but the underlying loop (Thought, Action, Observation, repeat) is exactly how modern agents still work. Understanding ReAct is understanding agents.
TL;DR
- What it is: an agent pattern that interleaves reasoning ("Thought") with tool calls ("Action") and tool results ("Observation"), looping until the model emits a final answer.
- Origin: Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (October 2022, ICLR 2023).
- Why it works: the model both decides what to do and explains why, so each tool call is contextualized by reasoning and each reasoning step is grounded by real observations.
- Modern equivalent: every "tool calling agent" with multi-turn loops is functionally a ReAct agent, even if the prompt format is structured JSON instead of "Thought:/Action:/Observation:".
- When it shines: open-ended tasks that need search, calculation, code execution, or browser actions.
- Failure modes: infinite loops, hallucinated tools, premature finalization, context overflow on long chains.
Origin: Yao et al., 2022
The canonical reference is Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao, "ReAct: Synergizing Reasoning and Acting in Language Models," ICLR 2023 (arXiv October 2022). The paper benchmarks four tasks: HotpotQA (multi-hop QA), FEVER (fact verification), ALFWorld (text-based household tasks), and WebShop (online shopping).
Two prior baselines fell short. Pure chain-of-thought reasoned well but had no way to ground itself in external facts; it hallucinated. Pure act-only agents (decide a tool, take it, repeat, no reasoning) had no plan; they wandered. ReAct combined the two and beat both, often by large margins, especially on tasks that required interleaving knowledge retrieval with reasoning.
The format the paper proposed has become iconic:
Question: Which magazine was started first, Arthur's Magazine or First for Women?
Thought 1: I need to find when Arthur's Magazine was started.
Action 1: search[Arthur's Magazine]
Observation 1: Arthur's Magazine was an American literary periodical published 1844-1846.
Thought 2: Arthur's Magazine started in 1844. I need First for Women's start year.
Action 2: search[First for Women]
Observation 2: First for Women is a women's magazine published by Bauer Media Group, started in 1989.
Thought 3: Arthur's Magazine (1844) started before First for Women (1989).
Action 3: finish[Arthur's Magazine]
That format encodes both reasoning and tool use in a single token stream, which is what makes it learnable by a vanilla LLM with a few examples.
The Thought-Action-Observation loop
Every ReAct run is the same loop.
- The model reads the current context (question, plus any prior thoughts and observations).
- It emits a Thought (free text reasoning about what to do next).
- It emits an Action (one of a predefined set of tools, with arguments).
- The runtime executes the action and returns an Observation.
- The Observation is appended to the context.
- Loop until the model emits a special "finish" action with the final answer.
The model decides when to stop. The runtime enforces hard limits (max steps, max tokens) so a confused model cannot loop forever.
A minimal ReAct agent in plain Python
No framework. Just the OpenAI SDK and a parser.
import re
from openai import OpenAI
client = OpenAI()
SYSTEM = """You are a ReAct agent. Use this format strictly:
Thought: <your reasoning>
Action: <tool>[<input>]
Observation: <provided by environment>
... (this can repeat)
Thought: <final reasoning>
Action: finish[<final answer>]
Available tools:
- search[query]: web search, returns top result snippet
- calc[expression]: evaluates a math expression
- finish[answer]: returns the final answer to the user
"""
def search(q):
# plug in your real search backend
return f"(mock result for: {q})"
def calc(expr):
return str(eval(expr, {"__builtins__": {}}))
TOOLS = {"search": search, "calc": calc}
ACTION_RE = re.compile(r"Action:\s*(\w+)\[(.+?)\]")
def run(question, max_steps=8):
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": f"Question: {question}"},
]
for _ in range(max_steps):
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
stop=["\nObservation:"],
)
text = resp.choices[0].message.content
messages.append({"role": "assistant", "content": text})
m = ACTION_RE.search(text)
if not m:
return "Agent emitted no action."
tool, arg = m.group(1), m.group(2)
if tool == "finish":
return arg
if tool not in TOOLS:
obs = f"Error: unknown tool '{tool}'."
else:
obs = TOOLS[tool](arg)
messages.append({"role": "user", "content": f"Observation: {obs}"})
return "Step limit reached."
print(run("What is 17 * (3 + 4)?"))That is the whole pattern in roughly 40 lines. The stop=["\nObservation:"] parameter is the trick: it tells the model to halt as soon as it expects an observation, so the runtime can inject the real one before the model hallucinates it.
This version uses the textual format. In 2026 most teams use native tool calling instead, which moves the parsing inside the model.
The same agent with native tool calling
from openai import OpenAI
client = OpenAI()
tools = [
{
"type": "function",
"function": {
"name": "search",
"description": "Web search. Returns top result snippet.",
"parameters": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"],
},
},
},
{
"type": "function",
"function": {
"name": "calc",
"description": "Evaluate a math expression.",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"],
},
},
},
]
def run(question, max_steps=8):
messages = [{"role": "user", "content": question}]
for _ in range(max_steps):
resp = client.chat.completions.create(
model="gpt-5.5",
messages=messages,
tools=tools,
)
msg = resp.choices[0].message
messages.append(msg)
if not msg.tool_calls:
return msg.content
for call in msg.tool_calls:
args = json.loads(call.function.arguments)
result = TOOLS[call.function.name](**args)
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": str(result),
})
return "Step limit reached."This is the same Thought-Action-Observation loop. The "Thought" is now hidden inside the model's reasoning (and for GPT-5.5, captured in a separate reasoning field). The "Action" is a structured tool_calls array. The "Observation" is a tool role message. The semantics are identical.
ReAct with LangGraph
For non-trivial agents, you want explicit state, branching, and recovery. LangGraph is the modern default. Here is the rough shape.
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
@tool
def search(query: str) -> str:
"""Search the web."""
return f"(mock result for: {query})"
@tool
def calc(expression: str) -> str:
"""Evaluate a math expression."""
return str(eval(expression, {"__builtins__": {}}))
agent = create_react_agent(
model=ChatOpenAI(model="gpt-5.5"),
tools=[search, calc],
)
result = agent.invoke({
"messages": [{"role": "user", "content": "What is 17 * (3 + 4)?"}]
})create_react_agent wires up the loop, the tool registry, message persistence, and stop conditions. For more control over the graph (branching, retries, human-in-the-loop checkpoints), you build the StateGraph yourself. See LangChain vs LangGraph for the tradeoffs.
When ReAct beats simpler patterns
A ReAct loop is the right tool when:
- The task requires external information the model does not have (web search, database lookup, current data).
- The task needs deterministic computation the model is bad at (arithmetic, code execution, structured queries).
- The number of steps is variable. If you always need exactly two steps, hard-code them; if it could be one or six, use ReAct.
- You want the model to recover from errors (a bad search query, a failed API call) by trying again with a new strategy.
It is overkill when:
- A single prompt with a single tool call suffices.
- The flow is fixed (extract entities, then classify, then format). Use prompt chaining instead, which is easier to test and cheaper to run.
- You need hard guarantees on which tools fire and in what order. Agents are non-deterministic by design.
Common failure modes
Infinite loops. The model decides every step that it needs "one more search." Always set a max_steps (5 to 10 for most tasks, 20 to 50 for deep research agents). Log when you hit the limit; that is a failure case.
Hallucinated tools. The model invents wikipedia[topic] when only search[query] exists. With native tool calling this is rare. With textual ReAct prompts it happens; the runtime should return a clear "unknown tool" observation, not silently retry.
Premature finish. The model emits finish[] after one step without enough information. Mitigation: a stronger system prompt ("do not finish until you have verified the answer with at least one search") or a separate evaluator step.
Context overflow. A 20-step ReAct trace can blow past 50k tokens. Use trajectory summarization: every N steps, replace older Thought/Observation pairs with a short summary. LangGraph supports this with a prune node.
Tool flakiness. A real search API times out. The agent does not know what to do and either loops or finishes with a wrong answer. Wrap tools with retries and clear error observations: "search timed out, try a different query or a different tool."
Reasoning drift. Over long traces, the model's thoughts start contradicting earlier observations. Periodically inject a "summarize what you know so far" step.
Modern relevance: do native tools obsolete ReAct?
No. They change the format, not the pattern.
In 2022, ReAct meant a textual prompt with Thought: / Action: / Observation: labels and a regex parser. In 2026, ReAct means a chat loop with tool_calls in the assistant message and tool role messages for results. The reasoning is internal to the model (in OpenAI's reasoning tokens, Claude's extended thinking, Gemini's internal scratchpad). The actions are structured. The observations are typed.
But the loop is the same. Every "tool calling agent" in production is doing exactly what ReAct described: deciding, acting, observing, deciding again. When you trace a real production agent with LLM tracing, you see the same shape on the screen that Yao et al. drew in their paper.
The conceptual contribution of ReAct (reasoning and acting must be interleaved, not separated) is still the foundation. The format is a 2026 detail.
FAQ
What is the difference between ReAct and chain-of-thought? CoT is reasoning only, in one shot. ReAct interleaves reasoning with tool calls and observations, in a loop. ReAct can ground its reasoning in real data; CoT cannot.
Is ReAct the same as AutoGPT or BabyAGI? AutoGPT and BabyAGI are ReAct-style agents with extra machinery (subgoal decomposition, memory, sometimes a separate planner). The core inner loop is ReAct.
Should I use textual ReAct or native tool calling in 2026? Native tool calling, almost always. It is more reliable, supports parallel tool calls, and integrates with structured outputs. Textual ReAct is only useful on models without native tool support.
How do I evaluate a ReAct agent? Trace every step. Score the final answer for correctness, the trace for efficiency (number of steps, tool calls, total tokens), and individual decisions for quality (was each tool call useful?). See best LLM evaluation tools.
Does ReAct work with RAG? Yes. The "search" tool can be your retrieval pipeline. That combination is essentially agentic RAG, which adds dynamic retrieval to a ReAct loop.
Can ReAct call other agents as tools? Yes. A tool can be another agent. This is the multi-agent pattern: a planner agent calls specialist agents (researcher, coder, reviewer) as tools. The outer loop is still ReAct.
What is the smallest model that runs ReAct reliably? For simple two-tool agents, GPT-4o-mini or Claude Haiku is enough. For multi-step research agents with five-plus tools, Sonnet 4.6 or GPT-5.5 is the practical floor.