AI & Agents

How to Implement LangGraph Persistence for Long-Term Memory

LangGraph persistence allows AI agents to maintain state and memory across multiple sessions by saving graph checkpoints to a database. This guide covers how to set up Postgres checkpointers, manage long-term agent memory, and handle human-in-the-loop workflows without losing context.

Fast.io Editorial Team 8 min read
Persistence layers enable AI agents to 'remember' past interactions.

What is LangGraph Persistence?

LangGraph persistence lets AI agents save their state after every execution step. Without it, agents are ephemeral. When the script ends, the conversation memory disappears. Persistence writes a "checkpoint" to a database or file every time the graph moves to a new node. Persistence is built around Threads. A thread represents one conversation or task. By giving every action a thread_id, LangGraph keeps different users and tasks separate. When you resume a thread, the agent picks up where it left off. It reconstructs its internal state from the last checkpoint. This enables three production features:

  • Long-term Memory: Agents resume conversations days or weeks later, remembering details from earlier sessions. * Human-in-the-Loop: Pause an agent, wait for human approval (like a high-value purchase or sensitive email), then resume from the same state. * Error Recovery: If an agent crashes or infrastructure fails, replay from the last checkpoint instead of starting over. This saves time and API costs.

Helpful references: Fast.io Workspaces, Fast.io Collaboration, and Fast.io AI.

How Checkpointers Work

In LangGraph, a Checkpointer is a class that saves and loads the graph's state. When you compile your graph, you pass a checkpointer instance. This connects the graph's execution logic to your storage backend. The workflow is automatic. It follows a "read-execute-write" cycle:

  1. Retrieve: The user sends a message with a thread_id. LangGraph queries the database for the latest checkpoint matching that ID. 2. Initialize: It loads the state (variables, message history, internal flags) into memory. 3. Execute: The agent processes the input, moving through the graph's nodes. 4. Persist: After each step, the new state is serialized and saved to the database as a new checkpoint. LangGraph doesn't overwrite the previous state. It keeps a history of checkpoints. This enables "time travel." You can tell an agent to return to an earlier state (if a tool call failed or the user wants to try a different path) and branch from there. This separates compute (the running agent) from memory (the database). Any worker node can pick up any task, making deployments stateless and scalable.

Step-by-Step: Setting up InMemory Checkpointer

For local testing, use the MemorySaver. This stores checkpoints in RAM. It's fast and requires no configuration, but clears when the process restarts. Good for prototyping and unit testing your graph logic.

1. Install LangGraph

pip install -U langgraph

2. Import and Initialize

from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph

### Initialize the checkpointer
checkpointer = MemorySaver()

### Build your graph (simplified example)
builder = StateGraph(...)
### ... define nodes and edges ...

### Compile with the checkpointer
graph = builder.compile(checkpointer=checkpointer)

3. Run with a Thread ID To use persistence, provide a config dictionary with a thread_id. This ID unlocks the agent's memory. ```python config = {"configurable": {"thread_id": "user-123"}}

First interaction

response = graph.invoke({"messages": [("user", "Hi, I'm Tom")]}, config=config)

Second interaction - Agent remembers "Tom"

response = graph.invoke({"messages": [("user", "What is my name?")]}, config=config) ```

Log of AI agent state checkpoints

Production: Implementing Postgres Persistence

For production, MemorySaver won't work. You need a durable database that handles concurrent requests and provides long-term reliability. PostgreSQL is the common choice for LangGraph persistence. It supports JSONB for complex state objects and maintains transactional integrity.

1. Install Dependencies

pip install psycopg psycopg-pool langgraph-checkpoint-postgres

2. Configure the PostgresSaver In production, use a connection pool to manage database resources. Use PostgresSaver for synchronous code or AsyncPostgresSaver for async applications. ```python from langgraph.checkpoint.postgres import PostgresSaver from psycopg_pool import ConnectionPool

DB_URI = "postgresql://user:pass@localhost:5432/db"

Create a connection pool context manager

with ConnectionPool(DB_URI) as pool: checkpointer = PostgresSaver(pool)

ONE-TIME SETUP: Create the necessary tables

checkpointer.setup()

Compile the graph

graph = builder.compile(checkpointer=checkpointer)

Run the graph

config = {"configurable": {"thread_id": "session-99"}}
graph.invoke(..., config=config)

**Important Note:** The `checkpointer.setup()` method must be called once to create the schema, including tables like `checkpoints` and `checkpoint_blobs`. In production deployments, handle this as part of a CI/CD migration script or a separate management command. Don't run it inside the primary application runtime. This prevents the app from trying to create tables that already exist or failing due to insufficient permissions during normal operation.

Handling Human-in-the-Loop Workflows

Persistence enables "Human-in-the-Loop" (HITL) patterns where an agent pauses for input. The state is saved to the database, so the human doesn't need to respond immediately. The agent can wait for minutes, hours, or days without consuming compute resources.

How it works:

  1. Define Interrupts: Set interrupt_before or interrupt_after points during graph compilation: graph.compile(checkpointer=checkpointer, interrupt_before=["approval_node"]). 2. Execution Pause: Run the agent. It executes nodes sequentially until it reaches the approval_node. Then it saves its state and stops. 3. Dormant State: The system sits idle in the database. No Python process needs to stay alive. 4. Human Review: A human operator reviews the current state via a dashboard or UI. They can see what the agent plans to do by inspecting the latest checkpoint. 5. State Modification (Optional): The human can use graph.update_state() to modify the agent's memory or pending actions before it continues. 6. Resume: The human sends a "resume" command. The agent loads the updated state from Postgres and picks up where it stopped.
AI agent waiting for human approval with state history

Managing Large Artifacts in Agent State

A common mistake is storing large binary artifacts (PDFs, high-resolution images, massive JSON datasets) directly in the LangGraph state.

The Problem with State Bloat: LangGraph creates a new checkpoint at every step. If your agent state includes a 50MB PDF and the agent takes 10 steps, the checkpointer writes 500MB of data to PostgreSQL. This causes database bloat, slower queries, and higher storage costs. It also slows down state loading.

The Recommended Reference Architecture: Store large files in specialized storage (like Fast.io) and keep only the reference URL or metadata in the LangGraph state. 1. Ingest and Upload: When the agent receives a file, upload it to Fast.io using a secure API or MCP server. 2. Store Metadata: The agent stores a lightweight reference like {"file_url": "https://fast.io/share/xyz123", "filename": "report.pdf"} in its state. 3. Efficient Persistence: The checkpoint is tiny (a few hundred bytes), keeping your Postgres database lean and fast. 4. Just-in-Time Retrieval: When a node needs the file, it fetches the content using the URL or passes the URL to an LLM with vision or document analysis capabilities. This keeps your persistence layer performant while your agent retains access to unlimited data volumes.

Frequently Asked Questions

How do I save LangGraph state?

You save LangGraph state by passing a 'checkpointer' object (like MemorySaver or PostgresSaver) to the compile() method of your graph. You must also provide a unique thread_id in the config when invoking the graph.

What is the difference between MemorySaver and PostgresSaver?

MemorySaver stores checkpoints in the computer's RAM, so data is lost when the program stops. PostgresSaver stores checkpoints in a PostgreSQL database, allowing state to persist across restarts and long periods of time.

Can I persist large files in LangGraph state?

It is not recommended. Storing large files directly in state bloats your database because a new copy is saved at every step. Instead, store files in external storage like Fast.io and only save the file URL in the LangGraph state.

What is a thread_id in LangGraph?

A thread_id is a unique identifier used to group a series of interactions into a single conversation or task. It allows the checkpointer to retrieve the correct state for a specific user, ensuring that different sessions remain isolated and secure.

Can I go back to a previous state in LangGraph?

Yes, LangGraph persistence supports 'time travel.' By using the thread_id and a specific checkpoint_id, you can inspect previous states, fork the conversation, or even restart from a middle point if an error occurred.

Related Resources

Fast.io features

Build Stateful Agents with Fast.io for langgraph persistence

Combine LangGraph's logic with Fast.io's storage. Keep your Postgres database clean by offloading file storage to our high-performance cloud.