How to Implement the ReAct Pattern for Reliable AI Agents
The ReAct pattern combines reasoning with action, letting agents solve problems by thinking before they act. This guide explains how to build the ReAct loop in Python, why it cuts hallucinations by over multiple%, and how to save agent thought traces for debugging. By separating reasoning from action, developers can build agents that fix their own mistakes and handle ambiguity.
What is the ReAct Pattern?: implementing react pattern agents
The ReAct pattern (Reasoning and Acting) is a framework that makes Large Language Models (LLMs) explain their thinking before they act. Instead of guessing an answer, the model enters a loop of reasoning, acting, and observing.
In a standard LLM chat, the model predicts the next word based on its training. This often causes "hallucinations" or confident errors when the model sees new information. In a ReAct loop, the model follows three steps for every move:
- Thought: The agent looks at the current state, says what it knows, and plans the next step.
- Action: The agent runs a specific tool command (e.g.,
search_web,query_database,read_file) to get missing information. - Observation: The agent gets the output from that tool and updates its context with this real data.
The "Internal Monologue" Advantage
This process creates an internal logic that lets the agent catch its own mistakes. If a search result returns no data, the "Thought" step in the next cycle can note the failure and try a different search, rather than making up an answer.
For example, if asked "Who is the CEO of Fast.io?", a standard model might guess based on old training data. A ReAct agent would:
- Thought: "I need to find the current CEO of Fast.io. I will search for this information."
- Action:
search_google("current CEO of Fast.io") - Observation: "Results show [Name] is the CEO..."
- Thought: "I found the answer."
- Final Answer: "The CEO is [Name]."
Helpful references: Fast.io Workspaces, Fast.io Collaboration, and Fast.io AI.
Why Reliability Requires Reasoning
The main benefit of the ReAct pattern is that it connects model outputs to real facts. Because the model must show its work and check facts with tools, reliability improves compared to standard prompts.
ReAct vs. Chain-of-Thought (CoT)
Chain-of-Thought prompting asks models to "think step-by-step," which helps with logic puzzles but fails when it needs outside facts. ReAct fixes this by adding the "Act" and "Observe" steps.
- Hallucination Reduction: According to a 2022 study by Google Research published on ArXiv, using the ReAct pattern cut hallucination rates by 5.multiple% compared to standard prompting methods.
- Fact Verification: In benchmarks like HotPotQA, ReAct agents consistently beat standard CoT models by getting real-time data rather than relying on old training data.
- Debuggability: When a ReAct agent fails, you can read the log to see exactly where it went wrong. Did it have the wrong thought? Did it call the tool with bad arguments? Did the tool return an error? This visibility is impossible with a single-shot prompt.
This structured approach changes an LLM from a text generator into a decision-making engine that handles changing situations.
Building the ReAct Loop in Python
Building a basic ReAct agent doesn't need big frameworks like LangChain or AutoGPT, though they can be useful. You need to understand the raw loop for debugging and customization. , a ReAct agent is a while loop that adds the thought-action-observation history to the prompt context until it stops.
1. The Prompt Template
The key part is the system prompt. It must define the output format. If the model breaks the format, the regex parser will fail.
SYSTEM_PROMPT = """
You are a helpful assistant with access to the following tools:
{tools_description}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
"""
2. The Main Loop
The Python code uses a loop to manage the conversation history.
def run_agent(question, max_steps=10):
history = f"Question: {question}
"
for _ in range(max_steps):
### 1. Get the model's next response
response = llm.generate(system=SYSTEM_PROMPT, prompt=history)
### 2. Check if we are done
if "Final Answer:" in response:
return extract_final_answer(response)
### 3. Parse the Action and Action Input
action, action_input = parse_response(response)
### 4. Execute the tool
observation = execute_tool(action, action_input)
### 5. Update history
thought_step = f"{response}
Observation: {observation}
"
history += thought_step
return "Timeout: Maximum steps reached."
3. Tool Execution
You need a router function that takes the parsed string and calls the actual Python function.
def execute_tool(action_name, input_str):
if action_name == "search":
return google_search(input_str)
elif action_name == "calculator":
return eval(input_str) # Warning: Use safe eval in production
else:
return f"Error: Tool {action_name} not found."
This loop is the core of the agent. It gives the model "eyes" and "hands" to interact with the world, managed by the text added to history.
Common Pitfalls and How to Fix Them
While the concept is simple, real-world ReAct agents often break. Here are the most common problems and how to fix them.
The "I Need to Search" Loop
Problem: The agent gets stuck in a loop, searching for the same term again and again without making progress.
Fix: specific instructions and history truncation.
- System Prompt: Add a directive: "If you observe the same result twice, try a different search term or strategy."
- Max Steps: Always set a strict
max_stepslimit (e.g., multiple or multiple) to prevent infinite loops from draining your API credits.
Parsing Failures
Problem: The model outputs "Action: search" but forgets "Action Input:", or uses the wrong capitalization, causing your regex parser to crash.
Fix: Better parsing with feedback. Instead of crashing, catch the parsing error and send it back to the model as an Observation.
- Observation: "Error: Invalid format. You must provide 'Action Input:'. Please try again." This allows the model to fix its format in the next turn.
Context Window Overflow
Problem: For long tasks, the history string grows larger than the model's context window (e.g., multiple or multiple tokens).
Fix: FIFO (First-In-First-Out) sliding window or summarization.
- Sliding Window: Keep the system prompt and the Question, but drop the oldest Thought/Observation pairs when the limit approaches.
- Summarization: Use a separate LLM call to summarize the middle of the conversation history into a concise "Memory" block.
The Missing Piece: Persistent State
A big problem with standard tutorials is that the "thought trace" stays in RAM. If the script crashes, the server restarts, or the container is killed, the agent's reasoning history is lost. This makes debugging long-running agents hard.
Storing Thoughts as Files
For reliable production agents, the "Thought" and "Observation" logs need to be written to persistent storage immediately.
- Logs: Writing the log to a JSON or Markdown file lets developers see exactly why an agent made a specific decision.
- Resuming: If an agent fails, a new process can read the
state.jsonfile and resume the loop from the last valid observation without restarting from scratch. - Handoffs: In a multi-agent system, Agent A can write its findings to a shared workspace, which Agent B reads as its initial context.
Implementation Example
Instead of just appending to a string variable, write to a structured file after every step.
def log_step(run_id, step_data):
filename = f"/workspaces/agents/logs/{run_id}.jsonl"
with open(filename, "a") as f:
f.write(json.dumps(step_data) + "
")
Fast.io workspaces work well for this. By mounting a Fast.io drive or using the MCP server, agents can treat the file system as their long-term memory, syncing their thought traces to a secure cloud that the human team can check on the web.
Implementing with Fast.io MCP
The Model Context Protocol (MCP) sets the standard for how agents work with external data. With the Fast.io MCP server, you can give your ReAct agent a persistent file system for both tool execution and state management.
Why MCP for ReAct?
MCP standardizes the tool definition. Instead of writing custom Python functions for read_file or write_file, you connect your agent to the Fast.io MCP server, which provides these tools automatically.
Configuration Steps
- Install the Server: Connect your agent environment (e.g., Claude Desktop, custom Python script, or LangChain) to the Fast.io MCP server.
- Define the Tool: Grant the agent the
write_fileandread_filecapabilities. - Update Instructions: Update the system prompt to instruct the agent to log its thoughts to a specific path, such as
/logs/trace-[id].md.
The Handoff Pattern
This setup allows multi-agent workflows.
- Agent multiple (Researcher): Runs a ReAct loop to gather data, writing its "Final Answer" to
research_summary.md. - Agent multiple (Writer): Starts its loop by reading
research_summary.mdvia MCP, using that context to draft a report.
This approach separates the agent's "brain" (the LLM) from its "memory" (the file system), ensuring that even if the reasoning model is swapped or the process terminates, the saved knowledge remains safe and accessible.
Frequently Asked Questions
What is the difference between ReAct and Chain-of-Thought?
Chain-of-Thought (CoT) asks the model to reason step-by-step internally before answering. ReAct extends this by adding an 'Action' step, where the model can use external tools to gather information, and an 'Observation' step to learn from the results. ReAct is essentially CoT plus external tools.
Why is the ReAct pattern better for reliability?
ReAct grounds the AI's responses in external reality. Instead of guessing facts, the agent must search or query for data. The explicit reasoning trace also allows developers to spot exactly where logic failed, making debugging much easier than with 'black box' prompts.
Does ReAct require a specific LLM?
While ReAct can work with many models, it requires an LLM capable of following complex instructions and adhering to a strict output format (like JSON or specific keywords). GPT-4, Claude 3.multiple Sonnet, and Gemini Pro are currently the most effective models for maintaining stable ReAct loops.
How do I debug a ReAct agent loop?
The best way to debug is to persist the 'Thought' and 'Observation' trace to a file. By reviewing this log, you can see if the agent is stuck in a loop, reading tool outputs wrong, or failing to generate correct tool arguments.
Can ReAct agents run indefinitely?
Technically yes, but practically no. Most implementations use a 'max_steps' parameter (e.g., multiple steps) to prevent infinite loops and control costs. If an agent hasn't solved the problem by then, it should exit and ask for human help.
Related Resources
Run Implementing React Pattern Agents workflows on Fast.io
Stop losing agent context. Use Fast.io's free agent workspaces to persist thought traces, share state between agents, and debug complex workflows. Built for implementing ReAct pattern agents.