What does Temporal add to an AI agent that a simple retry loop does not?

A retry loop handles individual function failures. Temporal handles entire workflow failures. If your process crashes after completing 40 of 50 steps, a retry loop starts over. Temporal resumes from step 41. It also provides visibility into running workflows, automatic state persistence, and built-in support for timeouts, signals, and child workflows.

How do I store files and artifacts from Temporal agent workflows?

Use the claim-check pattern. Store large outputs in external storage like Fastio, S3, or Google Cloud Storage, and pass a small reference through the workflow. This keeps Temporal's event history lean and your artifacts durable. Fastio's MCP server provides 19 tools for workspace and storage operations that agents can call directly from Temporal activities.

Can Temporal handle multi-agent systems where agents coordinate?

Yes. Temporal supports child workflows, signals, and shared state. Route requests to specialized agents using a router workflow, delegate subtasks with child workflows, or coordinate through shared storage. File locks prevent conflicts when multiple agents write to the same workspace.

What LLM frameworks work with Temporal?

Temporal has official integrations with the OpenAI Agents SDK and Pydantic AI. The Python and TypeScript SDKs work with any LLM framework, including LangChain, LlamaIndex, and direct API calls. Wrap your LLM calls in Temporal activities and the orchestration layer handles retries and persistence.

Is Temporal overkill for simple agent tasks?

For a single LLM call with a retry, yes. Temporal adds value when workflows have multiple steps, run for more than a few minutes, need human-in-the-loop approval, or coordinate multiple agents. If your agent is a single prompt-and-response, a simple retry wrapper is enough.

How to Integrate AI Agents with Temporal Workflows

Why AI Agents Fail in Production

Most AI agent demos run fine on a laptop. Production is a different story. An agent that calls an LLM, processes the result, calls a tool, and writes output has four distinct failure points. Any one of them can crash the entire run.

Common failure modes include LLM API rate limits and timeouts, network interruptions during tool calls, out-of-memory errors on large context windows, and upstream service outages that last minutes or hours. When a naive agent fails at step 47 of a 50-step workflow, it starts over from step 1. That means repeated API calls, wasted tokens, and lost progress.

Temporal solves this with durable execution. Every step in a Temporal workflow gets checkpointed automatically. If the process crashes at step 47, it resumes from step 47, not step 1. No repeated work, no wasted tokens, no lost state.

This is not theoretical. OpenAI runs Codex, their AI coding agent, on Temporal in production. Temporal Cloud has processed over 9.1 trillion action executions across its customer base, and the platform supports millions of concurrent workflow executions.

How Temporal's Architecture Fits AI Agents

Temporal splits work into two categories: workflows and activities. Understanding this split is the key to building reliable agents.

Workflows are your orchestration logic. They decide what happens next, in what order, and what to do when something fails. Workflow code must be deterministic so Temporal can replay it after a crash. This does not mean your agent's decisions are predetermined. It means given the same sequence of activity results, the workflow takes the same path.

Activities are where the real work happens. LLM calls, tool executions, file uploads, database writes. Activities can fail, timeout, and retry. Temporal manages all of this automatically.

Here is a minimal example of a research agent workflow in Python:

from temporalio import workflow
from datetime import timedelta

@workflow.defn
class ResearchAgentWorkflow:
    @workflow.run
    async def run(self, query: str) -> dict:
        # Step 1: Call the LLM to generate a research plan
        plan = await workflow.execute_activity(
            generate_plan,
            query,
            start_to_close_timeout=timedelta(minutes=5),
            retry_policy=RetryPolicy(maximum_attempts=3),
        )

# Step 2: Execute each research task
        results = []
        for task in plan["tasks"]:
            result = await workflow.execute_activity(
                execute_research_task,
                task,
                start_to_close_timeout=timedelta(minutes=10),
                retry_policy=RetryPolicy(maximum_attempts=3),
            )
            results.append(result)

# Step 3: Synthesize findings
        synthesis = await workflow.execute_activity(
            synthesize_results,
            results,
            start_to_close_timeout=timedelta(minutes=5),
        )

# Step 4: Store the final output
        await workflow.execute_activity(
            store_artifacts,
            synthesis,
            start_to_close_timeout=timedelta(minutes=2),
        )
        return synthesis

If the process crashes during step 2 after completing three of five research tasks, Temporal replays the workflow. It skips steps that already completed and resumes from the fourth task. The three finished results come from Temporal's event history, not from re-calling the LLM.

Workflow state management and activity checkpointing

Setting Up Temporal for Agent Workloads

You can run Temporal locally for development or use Temporal Cloud for production. Here is the setup path for both.

Local Development

Install the Temporal CLI and start a local dev server:

# Install Temporal CLI
brew install temporal  # macOS
# or: curl -sSf https://temporal.download/cli.sh | sh

# Start local dev server
temporal server start-dev

Install the Python SDK alongside your agent framework:

pip install temporalio openai

The local dev server runs Temporal with an in-memory store. Good for development, not for production.

Production with Temporal Cloud

For production workloads, Temporal Cloud handles infrastructure, scaling, and persistence. Sign up at temporal.io, create a namespace, and configure your client with the Cloud endpoint and credentials.

Implementing Activities

Activities wrap your non-deterministic operations. Each LLM call, each tool invocation, each file operation gets its own activity:

from temporalio import activity
from openai import OpenAI

client = OpenAI()

@activity.defn
async def generate_plan(query: str) -> dict:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Create a research plan as JSON."},
            {"role": "user", "content": query},
        ],
        response_format={"type": "json_object"},
    )
    return json.loads(response.choices[0].message.content)

@activity.defn
async def execute_research_task(task: dict) -> dict:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Research this topic thoroughly."},
            {"role": "user", "content": task["description"]},
        ],
    )
    return {
        "task": task["name"],
        "findings": response.choices[0].message.content,
    }

Starting the Worker

Workers poll for tasks and execute your workflow and activity code:

from temporalio.client import Client
from temporalio.worker import Worker

async def main():
    client = await Client.connect("localhost:7233")
    worker = Worker(
        client,
        task_queue="research-agent",
        workflows=[ResearchAgentWorkflow],
        activities=[generate_plan, execute_research_task,
                    synthesize_results, store_artifacts],
    )
    await worker.run()

Run the worker, then trigger workflows via the Temporal CLI or client SDK.

Give Your Temporal Agents Persistent Storage

Fastio's free agent tier includes 50 GB storage, file locks for multi-agent coordination, and an MCP server with 19 tools. No credit card required.

Storage Patterns for Temporal Agent Artifacts

Temporal persists workflow state and event history. It does not persist the files, documents, and datasets that agents produce. You need a separate storage layer for artifacts.

This is where most Temporal agent guides stop. They show the orchestration but skip the storage problem. In practice, artifact storage is one of the hardest parts of running agents in production.

The Claim-Check Pattern

When an activity produces a large output, store it externally and pass a reference (the "claim check") through the workflow. This keeps Temporal's event history lean. Temporal recommends this approach for any payload larger than a few kilobytes.

@activity.defn
async def store_artifacts(synthesis: dict) -> str:
    # Upload to external storage, return a reference
    file_content = json.dumps(synthesis, indent=2).encode()
    # Store in your workspace and return the file ID
    file_id = upload_to_workspace(file_content, "research/output.json")
    return file_id  # This small reference goes into event history

State Snapshots for Long Conversations

Agents that maintain chat history across many turns accumulate large context. Serialize the conversation state periodically and store it externally. On resume after a crash, load the latest snapshot instead of replaying every message.

Multi-Agent File Coordination

When multiple agents write to shared storage, you need coordination. File locks prevent two agents from overwriting each other's work. One agent acquires a lock, writes its output, and releases. The next agent can then safely read and build on that output.

Fastio handles these patterns out of the box. The free agent tier includes 50 GB of storage, file versioning, and file locks for concurrent access. Agents connect through the MCP server with 19 consolidated tools covering workspace, storage, AI, and workflow operations. Intelligence Mode auto-indexes uploaded files for semantic search, so agents can query previous outputs without building a separate vector database.

For teams already using S3 or Google Cloud Storage, those work too. The key principle is: keep large artifacts out of Temporal's event history and use references to retrieve them.

Retry Policies and Error Handling for LLM Calls

LLM APIs fail in predictable ways. Rate limits return 429 status codes. Timeouts happen when the model takes too long. Malformed responses occur when the model ignores your schema instructions. Each failure type needs a different retry strategy.

Configuring Retry Policies

Temporal's default retry policy uses exponential backoff with a 2x coefficient, starting at 1 second and capping at 100 seconds. For LLM activities, you will want to customize this:

from temporalio.common import RetryPolicy

llm_retry = RetryPolicy(
    initial_interval=timedelta(seconds=2),
    backoff_coefficient=2.0,
    maximum_interval=timedelta(seconds=60),
    maximum_attempts=5,
    non_retryable_error_types=["ValidationError"],
)

Set non_retryable_error_types for errors that will never succeed on retry. If the LLM returns invalid JSON that your code cannot parse, retrying might help since the model is non-deterministic. But if the input itself is invalid, retrying wastes tokens.

Timeouts That Match Reality

Set start_to_close_timeout based on actual LLM latency. A simple completion might take 5 seconds. A complex reasoning chain with tool use might take 2 minutes. Measure your P99 latency and set the timeout at 2-3x that value.

# Quick classification task
result = await workflow.execute_activity(
    classify_document,
    doc,
    start_to_close_timeout=timedelta(seconds=30),
    retry_policy=llm_retry,
)

# Complex multi-step reasoning
analysis = await workflow.execute_activity(
    deep_analysis,
    data,
    start_to_close_timeout=timedelta(minutes=5),
    retry_policy=llm_retry,
)

Handling Rate Limits Gracefully

When you hit rate limits, Temporal's exponential backoff helps, but you can also use activity heartbeats to report progress during long-running tasks. If a task is actively working but slow, heartbeats prevent Temporal from timing it out prematurely.

Human-in-the-Loop Escalation

For failures that automated retries cannot fix, use Temporal signals to pause the workflow and wait for human input:

@workflow.defn
class AgentWithEscalation:
    def __init__(self):
        self.human_response = None

@workflow.signal
    async def provide_human_input(self, response: str):
        self.human_response = response

@workflow.run
    async def run(self, task: str) -> str:
        try:
            return await workflow.execute_activity(
                agent_task, task,
                start_to_close_timeout=timedelta(minutes=10),
            )
        except Exception:
            # Notify human, wait for input
            await workflow.execute_activity(notify_human, task)
            await workflow.wait_condition(
                lambda: self.human_response is not None
            )
            return self.human_response

This pattern works well for agents that handle sensitive operations where automated fallbacks are not appropriate.

Multi-Agent Orchestration Patterns

Single-agent workflows are straightforward. Production systems often need multiple specialized agents coordinating on a shared task. Temporal supports several patterns for this.

Agent Routing

A router workflow receives requests and delegates to specialized agents. A customer service system might route billing questions to a billing agent and technical issues to a support agent. Each agent runs as a child workflow with its own retry policies and timeouts.

@workflow.defn
class RouterWorkflow:
    @workflow.run
    async def run(self, request: dict) -> str:
        # Classify the request
        category = await workflow.execute_activity(
            classify_request, request,
            start_to_close_timeout=timedelta(seconds=30),
        )

# Route to specialized agent
        if category == "billing":
            return await workflow.execute_child_workflow(
                BillingAgentWorkflow.run, request
            )
        elif category == "technical":
            return await workflow.execute_child_workflow(
                TechnicalAgentWorkflow.run, request
            )

Task Delegation

For complex research or analysis, break the work into subtasks and fan out to multiple agents. A lead agent creates the plan, delegates tasks to specialist agents running as child workflows, and then synthesizes their outputs.

Pipeline Chains

Sequential processing where each agent's output feeds the next. A data pipeline might use one agent to extract information, another to validate it, and a third to transform it into the final format. Each stage is a separate workflow, connected through Temporal's event system or through shared storage.

Coordination Through Shared Storage

When agents share a Fastio workspace, they can read each other's outputs directly. Agent A writes research findings to the workspace. Agent B picks them up, adds analysis, and writes back. File locks prevent conflicts, and Intelligence Mode lets agents search previous outputs semantically rather than by exact filename.

The ownership transfer pattern also fits here. An agent builds a complete workspace with research, analysis, and recommendations. When the work is done, it transfers ownership to a human reviewer who gets full access through the same workspace UI. The agent retains admin access for follow-up tasks.

Multi-agent coordination through shared workspaces

Deploying to Production

Moving from local development to production requires decisions about infrastructure, observability, and scaling.

Infrastructure

Temporal Cloud is the simplest production path. It handles server infrastructure, persistence, and scaling. You deploy only your workers, which run your workflow and activity code. For self-hosted deployments, Temporal recommends starting with 512 shards for small production clusters.

Observability

Temporal's Web UI shows running workflows, their current state, and complete event histories. For agent workloads, add structured logging in your activities so you can trace which LLM calls happened, what they returned, and how long they took.

Track these metrics for agent-specific monitoring: workflow completion rate, average retries per activity, LLM token usage per workflow, end-to-end latency from start to final output, and artifact storage volume.

Scaling Workers

Workers are stateless. Scale them horizontally by running more instances. For agent workloads, the bottleneck is usually LLM API rate limits, not worker capacity. Match your worker count to your API throughput.

Storage at Scale

As agent output volume grows, your storage layer needs to keep up. Fastio's free agent tier provides 50 GB with no credit card required. For larger deployments, workspaces support granular permissions so different agent teams can have isolated storage with separate access controls.

Webhooks connect storage events back to Temporal. When a file is uploaded or modified, a webhook can trigger a new workflow or signal an existing one. This creates reactive pipelines where agents respond to new data automatically, without polling.

Security Considerations

Agent workflows often handle sensitive data. Use Temporal's namespace isolation to separate workloads. Encrypt payloads with Temporal's data converter for sensitive activity inputs and outputs. For artifact storage, workspace-level permissions and audit trails track who accessed what and when.

How to Build Fault-Tolerant AI Agents with Temporal

Why AI Agents Fail in Production

How Temporal's Architecture Fits AI Agents

Setting Up Temporal for Agent Workloads

Give Your Temporal Agents Persistent Storage

Storage Patterns for Temporal Agent Artifacts

Retry Policies and Error Handling for LLM Calls

Multi-Agent Orchestration Patterns

Deploying to Production

Frequently Asked Questions

Related Resources

Give Your Temporal Agents Persistent Storage