How to Build Fault-Tolerant AI Agents with Temporal
Temporal gives AI agents something they badly need: the ability to survive failures mid-execution and pick up exactly where they left off. This guide walks through integrating AI agents with Temporal workflows, covering architecture decisions, storage patterns for agent artifacts, and the practical steps to move from a fragile script to a production-grade system.
Why AI Agents Fail in Production
Most AI agent demos run fine on a laptop. Production is a different story. An agent that calls an LLM, processes the result, calls a tool, and writes output has four distinct failure points. Any one of them can crash the entire run.
Common failure modes include LLM API rate limits and timeouts, network interruptions during tool calls, out-of-memory errors on large context windows, and upstream service outages that last minutes or hours. When a naive agent fails at step 47 of a 50-step workflow, it starts over from step 1. That means repeated API calls, wasted tokens, and lost progress.
Temporal solves this with durable execution. Every step in a Temporal workflow gets checkpointed automatically. If the process crashes at step 47, it resumes from step 47, not step 1. No repeated work, no wasted tokens, no lost state.
This is not theoretical. OpenAI runs Codex, their AI coding agent, on Temporal in production. Temporal Cloud has processed over 9.1 trillion action executions across its customer base, and the platform supports millions of concurrent workflow executions.
How Temporal's Architecture Fits AI Agents
Temporal splits work into two categories: workflows and activities. Understanding this split is the key to building reliable agents.
Workflows are your orchestration logic. They decide what happens next, in what order, and what to do when something fails. Workflow code must be deterministic so Temporal can replay it after a crash. This does not mean your agent's decisions are predetermined. It means given the same sequence of activity results, the workflow takes the same path.
Activities are where the real work happens. LLM calls, tool executions, file uploads, database writes. Activities can fail, timeout, and retry. Temporal manages all of this automatically.
Here is a minimal example of a research agent workflow in Python:
from temporalio import workflow
from datetime import timedelta
@workflow.defn
class ResearchAgentWorkflow:
@workflow.run
async def run(self, query: str) -> dict:
# Step 1: Call the LLM to generate a research plan
plan = await workflow.execute_activity(
generate_plan,
query,
start_to_close_timeout=timedelta(minutes=5),
retry_policy=RetryPolicy(maximum_attempts=3),
)
# Step 2: Execute each research task
results = []
for task in plan["tasks"]:
result = await workflow.execute_activity(
execute_research_task,
task,
start_to_close_timeout=timedelta(minutes=10),
retry_policy=RetryPolicy(maximum_attempts=3),
)
results.append(result)
# Step 3: Synthesize findings
synthesis = await workflow.execute_activity(
synthesize_results,
results,
start_to_close_timeout=timedelta(minutes=5),
)
# Step 4: Store the final output
await workflow.execute_activity(
store_artifacts,
synthesis,
start_to_close_timeout=timedelta(minutes=2),
)
return synthesis
If the process crashes during step 2 after completing three of five research tasks, Temporal replays the workflow. It skips steps that already completed and resumes from the fourth task. The three finished results come from Temporal's event history, not from re-calling the LLM.
Setting Up Temporal for Agent Workloads
You can run Temporal locally for development or use Temporal Cloud for production. Here is the setup path for both.
Local Development
Install the Temporal CLI and start a local dev server:
# Install Temporal CLI
brew install temporal # macOS
# or: curl -sSf https://temporal.download/cli.sh | sh
# Start local dev server
temporal server start-dev
Install the Python SDK alongside your agent framework:
pip install temporalio openai
The local dev server runs Temporal with an in-memory store. Good for development, not for production.
Production with Temporal Cloud
For production workloads, Temporal Cloud handles infrastructure, scaling, and persistence. Sign up at temporal.io, create a namespace, and configure your client with the Cloud endpoint and credentials.
Implementing Activities
Activities wrap your non-deterministic operations. Each LLM call, each tool invocation, each file operation gets its own activity:
from temporalio import activity
from openai import OpenAI
client = OpenAI()
@activity.defn
async def generate_plan(query: str) -> dict:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Create a research plan as JSON."},
{"role": "user", "content": query},
],
response_format={"type": "json_object"},
)
return json.loads(response.choices[0].message.content)
@activity.defn
async def execute_research_task(task: dict) -> dict:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Research this topic thoroughly."},
{"role": "user", "content": task["description"]},
],
)
return {
"task": task["name"],
"findings": response.choices[0].message.content,
}
Starting the Worker
Workers poll for tasks and execute your workflow and activity code:
from temporalio.client import Client
from temporalio.worker import Worker
async def main():
client = await Client.connect("localhost:7233")
worker = Worker(
client,
task_queue="research-agent",
workflows=[ResearchAgentWorkflow],
activities=[generate_plan, execute_research_task,
synthesize_results, store_artifacts],
)
await worker.run()
Run the worker, then trigger workflows via the Temporal CLI or client SDK.
Give Your Temporal Agents Persistent Storage
Fast.io's free agent tier includes 50 GB storage, file locks for multi-agent coordination, and an MCP server with 19 tools. No credit card required.
Storage Patterns for Temporal Agent Artifacts
Temporal persists workflow state and event history. It does not persist the files, documents, and datasets that agents produce. You need a separate storage layer for artifacts.
This is where most Temporal agent guides stop. They show the orchestration but skip the storage problem. In practice, artifact storage is one of the hardest parts of running agents in production.
The Claim-Check Pattern
When an activity produces a large output, store it externally and pass a reference (the "claim check") through the workflow. This keeps Temporal's event history lean. Temporal recommends this approach for any payload larger than a few kilobytes.
@activity.defn
async def store_artifacts(synthesis: dict) -> str:
# Upload to external storage, return a reference
file_content = json.dumps(synthesis, indent=2).encode()
# Store in your workspace and return the file ID
file_id = upload_to_workspace(file_content, "research/output.json")
return file_id # This small reference goes into event history
State Snapshots for Long Conversations
Agents that maintain chat history across many turns accumulate large context. Serialize the conversation state periodically and store it externally. On resume after a crash, load the latest snapshot instead of replaying every message.
Multi-Agent File Coordination
When multiple agents write to shared storage, you need coordination. File locks prevent two agents from overwriting each other's work. One agent acquires a lock, writes its output, and releases. The next agent can then safely read and build on that output.
Fast.io handles these patterns out of the box. The free agent tier includes 50 GB of storage, file versioning, and file locks for concurrent access. Agents connect through the MCP server with 19 consolidated tools covering workspace, storage, AI, and workflow operations. Intelligence Mode auto-indexes uploaded files for semantic search, so agents can query previous outputs without building a separate vector database.
For teams already using S3 or Google Cloud Storage, those work too. The key principle is: keep large artifacts out of Temporal's event history and use references to retrieve them.
Retry Policies and Error Handling for LLM Calls
LLM APIs fail in predictable ways. Rate limits return 429 status codes. Timeouts happen when the model takes too long. Malformed responses occur when the model ignores your schema instructions. Each failure type needs a different retry strategy.
Configuring Retry Policies
Temporal's default retry policy uses exponential backoff with a 2x coefficient, starting at 1 second and capping at 100 seconds. For LLM activities, you will want to customize this:
from temporalio.common import RetryPolicy
llm_retry = RetryPolicy(
initial_interval=timedelta(seconds=2),
backoff_coefficient=2.0,
maximum_interval=timedelta(seconds=60),
maximum_attempts=5,
non_retryable_error_types=["ValidationError"],
)
Set non_retryable_error_types for errors that will never succeed on retry. If the LLM returns invalid JSON that your code cannot parse, retrying might help since the model is non-deterministic. But if the input itself is invalid, retrying wastes tokens.
Timeouts That Match Reality
Set start_to_close_timeout based on actual LLM latency. A simple completion might take 5 seconds. A complex reasoning chain with tool use might take 2 minutes. Measure your P99 latency and set the timeout at 2-3x that value.
# Quick classification task
result = await workflow.execute_activity(
classify_document,
doc,
start_to_close_timeout=timedelta(seconds=30),
retry_policy=llm_retry,
)
# Complex multi-step reasoning
analysis = await workflow.execute_activity(
deep_analysis,
data,
start_to_close_timeout=timedelta(minutes=5),
retry_policy=llm_retry,
)
Handling Rate Limits Gracefully
When you hit rate limits, Temporal's exponential backoff helps, but you can also use activity heartbeats to report progress during long-running tasks. If a task is actively working but slow, heartbeats prevent Temporal from timing it out prematurely.
Human-in-the-Loop Escalation
For failures that automated retries cannot fix, use Temporal signals to pause the workflow and wait for human input:
@workflow.defn
class AgentWithEscalation:
def __init__(self):
self.human_response = None
@workflow.signal
async def provide_human_input(self, response: str):
self.human_response = response
@workflow.run
async def run(self, task: str) -> str:
try:
return await workflow.execute_activity(
agent_task, task,
start_to_close_timeout=timedelta(minutes=10),
)
except Exception:
# Notify human, wait for input
await workflow.execute_activity(notify_human, task)
await workflow.wait_condition(
lambda: self.human_response is not None
)
return self.human_response
This pattern works well for agents that handle sensitive operations where automated fallbacks are not appropriate.
Multi-Agent Orchestration Patterns
Single-agent workflows are straightforward. Production systems often need multiple specialized agents coordinating on a shared task. Temporal supports several patterns for this.
Agent Routing
A router workflow receives requests and delegates to specialized agents. A customer service system might route billing questions to a billing agent and technical issues to a support agent. Each agent runs as a child workflow with its own retry policies and timeouts.
@workflow.defn
class RouterWorkflow:
@workflow.run
async def run(self, request: dict) -> str:
# Classify the request
category = await workflow.execute_activity(
classify_request, request,
start_to_close_timeout=timedelta(seconds=30),
)
# Route to specialized agent
if category == "billing":
return await workflow.execute_child_workflow(
BillingAgentWorkflow.run, request
)
elif category == "technical":
return await workflow.execute_child_workflow(
TechnicalAgentWorkflow.run, request
)
Task Delegation
For complex research or analysis, break the work into subtasks and fan out to multiple agents. A lead agent creates the plan, delegates tasks to specialist agents running as child workflows, and then synthesizes their outputs.
Pipeline Chains
Sequential processing where each agent's output feeds the next. A data pipeline might use one agent to extract information, another to validate it, and a third to transform it into the final format. Each stage is a separate workflow, connected through Temporal's event system or through shared storage.
Coordination Through Shared Storage
When agents share a Fast.io workspace, they can read each other's outputs directly. Agent A writes research findings to the workspace. Agent B picks them up, adds analysis, and writes back. File locks prevent conflicts, and Intelligence Mode lets agents search previous outputs semantically rather than by exact filename.
The ownership transfer pattern also fits here. An agent builds a complete workspace with research, analysis, and recommendations. When the work is done, it transfers ownership to a human reviewer who gets full access through the same workspace UI. The agent retains admin access for follow-up tasks.
Deploying to Production
Moving from local development to production requires decisions about infrastructure, observability, and scaling.
Infrastructure
Temporal Cloud is the simplest production path. It handles server infrastructure, persistence, and scaling. You deploy only your workers, which run your workflow and activity code. For self-hosted deployments, Temporal recommends starting with 512 shards for small production clusters.
Observability
Temporal's Web UI shows running workflows, their current state, and complete event histories. For agent workloads, add structured logging in your activities so you can trace which LLM calls happened, what they returned, and how long they took.
Track these metrics for agent-specific monitoring: workflow completion rate, average retries per activity, LLM token usage per workflow, end-to-end latency from start to final output, and artifact storage volume.
Scaling Workers
Workers are stateless. Scale them horizontally by running more instances. For agent workloads, the bottleneck is usually LLM API rate limits, not worker capacity. Match your worker count to your API throughput.
Storage at Scale
As agent output volume grows, your storage layer needs to keep up. Fast.io's free agent tier provides 50 GB with no credit card required. For larger deployments, workspaces support granular permissions so different agent teams can have isolated storage with separate access controls.
Webhooks connect storage events back to Temporal. When a file is uploaded or modified, a webhook can trigger a new workflow or signal an existing one. This creates reactive pipelines where agents respond to new data automatically, without polling.
Security Considerations
Agent workflows often handle sensitive data. Use Temporal's namespace isolation to separate workloads. Encrypt payloads with Temporal's data converter for sensitive activity inputs and outputs. For artifact storage, workspace-level permissions and audit trails track who accessed what and when.
Frequently Asked Questions
What does Temporal add to an AI agent that a simple retry loop does not?
A retry loop handles individual function failures. Temporal handles entire workflow failures. If your process crashes after completing 40 of 50 steps, a retry loop starts over. Temporal resumes from step 41. It also provides visibility into running workflows, automatic state persistence, and built-in support for timeouts, signals, and child workflows.
How do I store files and artifacts from Temporal agent workflows?
Use the claim-check pattern. Store large outputs in external storage like Fast.io, S3, or Google Cloud Storage, and pass a small reference through the workflow. This keeps Temporal's event history lean and your artifacts durable. Fast.io's MCP server provides 19 tools for workspace and storage operations that agents can call directly from Temporal activities.
Can Temporal handle multi-agent systems where agents coordinate?
Yes. Temporal supports child workflows, signals, and shared state. Route requests to specialized agents using a router workflow, delegate subtasks with child workflows, or coordinate through shared storage. File locks prevent conflicts when multiple agents write to the same workspace.
What LLM frameworks work with Temporal?
Temporal has official integrations with the OpenAI Agents SDK and Pydantic AI. The Python and TypeScript SDKs work with any LLM framework, including LangChain, LlamaIndex, and direct API calls. Wrap your LLM calls in Temporal activities and the orchestration layer handles retries and persistence.
Is Temporal overkill for simple agent tasks?
For a single LLM call with a retry, yes. Temporal adds value when workflows have multiple steps, run for more than a few minutes, need human-in-the-loop approval, or coordinate multiple agents. If your agent is a single prompt-and-response, a simple retry wrapper is enough.
Related Resources
Give Your Temporal Agents Persistent Storage
Fast.io's free agent tier includes 50 GB storage, file locks for multi-agent coordination, and an MCP server with 19 tools. No credit card required.