What is the difference between Reflection and Chain of Thought?

Chain of Thought (CoT) asks the model to 'think out loud' step-by-step before generating an answer. Reflection asks the model to generate an answer first, then critique it, and then generate a new, better answer. CoT happens during generation; Reflection happens after generation.

How much does the Reflection pattern improve accuracy?

According to the 'Reflexion' paper (Shinn et al.), using this pattern improved performance on the HumanEval coding benchmark from 80% (GPT-4) to 91%. This significant boost is important for autonomous coding agents.

Does the Reflection pattern require two different LLMs?

No, you can use the same LLM for both generation and reflection. However, using a more capable model for the reflection (critique) phase often gives better results, as it can catch details the smaller model missed.

How do I prevent my agent from getting stuck in a loop?

Always implement a `max_iterations` counter (e.g., three loops). Also, add a stop condition: if the critique phase returns 'No changes needed,' the loop should terminate immediately.

Can I use the Reflection pattern with open-source models?

Yes, open-source models like Llama and Mixtral are good candidates for the Reflection pattern. In fact, using Reflection can often boost the performance of smaller open-source models to match proprietary ones like GPT-Four.

Self-Correcting Agents: The Reflection Pattern Guide

What Is the Reflection Pattern?

The Reflection pattern is a design strategy where an AI agent reviews its own output before marking a task done. Instead of accepting the first draft, the agent acts as its own reviewer. It finds mistakes, hallucinations, or logical gaps, then revises its work.

In a standard "chain of thought" workflow, an agent generates a response and stops. In a Reflection workflow, the agent enters a loop: it generates an initial output, asks itself to review that output, and uses that review to generate a better version.

This works like human drafting. We rarely write a perfect email or report in one go. We write, read it over, fix typos or unclear phrasing, and then send. Self-correcting agents apply this principle to code generation, content writing, and complex reasoning tasks.

Why It Matters Self-correction works. According to research on the "Reflexion" architecture, agents that review their own code can solve many more problems than those that just generate code once. The ability to "think about thinking" (metacognition) turns a text predictor into a reliable problem solver.

Helpful references: Fastio Workspaces, Fastio Collaboration, and Fastio AI.

Flowchart comparing a linear agent workflow to a recursive reflection workflow

The 3 Steps of the Reflection Loop

You need three steps to build a self-correcting agent. While you can do this in one LLM call, it works better when split into steps or separate agent personas.

1. Generate (The Draft) The agent tries the task using the first prompt. This is the "first draft." Instruct the agent to show its work, as this gives more context for the review phase. For example, if asked to write a Python script, it generates the full code block.

2. Reflect (The Critique) The agent reviews its generated output against specific rules. This step is important: you must instruct the agent to be critical. If you just ask "Is this good?", an LLM will often blindly say "Yes." Instead, prompt it to:

"Find multiple potential security vulnerabilities."
"Identify any logic errors."
"List edge cases this code misses."

This stage produces a feedback list, not a new draft.

3. Refine (The Final Polish) The agent takes the original prompt, the first draft, and the critique, and produces a final version. This version addresses every point raised in the reflection phase. The prompt here is simple: "Rewrite the code to address the following critique points."

Pro Tip: For complex tasks, you can loop through these steps multiple times. Always set a maximum number of iterations to prevent infinite loops and high token costs.

Example: Self-Correcting Code Generation Imagine an agent asked to write a function that sorts a list.

Draft: Writes a bubble sort (quadratic time).
Reflect: "Critique: Bubble sort is slow for large lists. The prompt didn't specify list size, but log-linear time is safer."
Refine: Rewrites the function using Quicksort or Merge Sort.

Why Persistence Matters for Self-Correction

Most Reflection pattern tutorials miss one thing: memory. If an agent corrects a mistake in one session but makes the exact same mistake in the next, it hasn't learned. It has only corrected a single instance.

To self-correct, agents need persistent memory. By storing successful reflections and past critiques, an agent can consult its own history before generating new content. This changes the pattern from "Reflect on this task" to "Reflect on my past performance."

Storage Strategy for Agents:

Log every critique: Save the feedback generated during the "Reflect" phase to a structured log file (e.g., JSON or Markdown) in your Fastio workspace.
Index by topic: Tag these logs so the agent can retrieve relevant past mistakes when starting a new similar task.
Consult before generating: Modify the "Generate" prompt to include: "Review your past mistakes on this topic: [insert retrieved logs]."

This turns a stateless LLM into a system that learns. Instead of just reacting to the current prompt, it remembers its past corrections.

Abstract visualization of AI agent memory blocks being stored and retrieved

Give Your AI Agents Persistent Storage

Fastio gives your self-correcting agents the storage they need to learn and improve over time. Built for reflection pattern self correcting agents workflows.

Start Free

Advanced Reflection Architectures

Once you have the basic loop working, you can explore more advanced variations of the pattern.

Reflexion (Verbal Reinforcement) The "Reflexion" method treats the critique as verbal reinforcement learning. Instead of just fixing the code, the agent summarizes why it failed (e.g., "I forgot to import the math library"). This summary is stored in a sliding window of memory. In future steps, the agent reads these summaries to avoid repeating the same error types.

Multi-Agent Debate Instead of one agent critiquing itself, use two agents with different personas.

Agent A (Builder): Generates the solution.
Agent B (Critic): Dedicated to finding flaws. This reduces the "Yes Man" bias where a model struggles to critique its own output.

Tree of Thoughts This extends reflection by generating multiple possible next steps, reflecting on each one to score its quality, and then choosing the best path. It's like a game of chess where the agent looks several moves ahead before committing.

How to Build It with Fastio

Fastio supports self-correcting agents by giving them a shared filesystem for thoughts and memories. Unlike complex vector databases, Fastio offers simple, file-based long-term memory.

Step 1: Initialize Your Workspace Create a dedicated workspace for your agent. This is where it will store its logs and memory files.

clawhub install dbalve/fast-io

Step 2: Create a Memory File Have your agent create a memory.md file. This works as a log of past mistakes.

### Agent Memory
- [LOG-ENTRY-A] Failed to handle null inputs in Python function. Fix: Always add "if input is None" check.
- [LOG-ENTRY-B] Hallucinated a citation. Fix: Verify URLs before finalizing text.

Step 3: The Connected Loop Now, connect the reflection loop to this file using the Fastio MCP server:

Read: Before generating, the agent reads memory.md.
Generate: It writes the draft, applying lessons from the memory file.
Reflect: It critiques the draft.
Write: If it finds a new type of error, it appends a new rule to memory.md.

The more your agent works, the smarter and more reliable it becomes, without you needing to fine-tune the model itself.

Common Pitfalls to Avoid

The Reflection pattern introduces new challenges that developers must manage.

The "Yes Man" Problem If the same model is used for both generation and reflection, it may be biased to approve its own work. It often fails to spot subtle logic errors because the same underlying probabilistic patterns produced them.

Fix: Use a stronger model for the Reflection step (e.g., use GPT-Four-O to critique a Llama draft), or use a different system prompt persona ("You are a harsh code reviewer who hates inefficiency...").

Context Window Bloat Storing the draft, the critique, and the revision in the context window consumes tokens rapidly.

Fix: Use Fastio to offload intermediate steps to file storage. Instead of keeping everything in context, write the draft to a file, read it for the critique, and write the critique to a separate file. This keeps the active context window clean and reduces costs.

Latency vs. Quality Reflection increases the compute time for a task because you are running multiple inference calls instead of a single step.

Fix: Only apply reflection to high-stakes tasks or when the initial confidence score is low. Not every user query needs a multi-step review process.

How to Implement the Reflection Pattern for Self-Correcting Agents

What Is the Reflection Pattern?

The 3 Steps of the Reflection Loop

Why Persistence Matters for Self-Correction

Give Your AI Agents Persistent Storage

Advanced Reflection Architectures

How to Build It with Fastio

Common Pitfalls to Avoid

Frequently Asked Questions

Related Resources

Give Your AI Agents Persistent Storage