How to Implement LangGraph Persistence for Long-Term Memory
LangGraph persistence allows AI agents to maintain state and memory across multiple sessions by saving graph checkpoints to a database. This guide covers how to set up Postgres checkpointers, manage long-term agent memory, and handle human-in-the-loop workflows without losing context.
What is LangGraph Persistence?
LangGraph persistence lets AI agents save their state after every execution step. Without it, agents are ephemeral. When the script ends, the conversation memory disappears. Persistence writes a "checkpoint" to a database or file every time the graph moves to a new node. Persistence is built around Threads. A thread represents one conversation or task. By giving every action a thread_id, LangGraph keeps different users and tasks separate. When you resume a thread, the agent picks up where it left off. It reconstructs its internal state from the last checkpoint. This enables three production features:
- Long-term Memory: Agents resume conversations days or weeks later, remembering details from earlier sessions. * Human-in-the-Loop: Pause an agent, wait for human approval (like a high-value purchase or sensitive email), then resume from the same state. * Error Recovery: If an agent crashes or infrastructure fails, replay from the last checkpoint instead of starting over. This saves time and API costs.
Helpful references: Fast.io Workspaces, Fast.io Collaboration, and Fast.io AI.
How Checkpointers Work
In LangGraph, a Checkpointer is a class that saves and loads the graph's state. When you compile your graph, you pass a checkpointer instance. This connects the graph's execution logic to your storage backend. The workflow is automatic. It follows a "read-execute-write" cycle:
- Retrieve: The user sends a message with a
thread_id. LangGraph queries the database for the latest checkpoint matching that ID. 2. Initialize: It loads the state (variables, message history, internal flags) into memory. 3. Execute: The agent processes the input, moving through the graph's nodes. 4. Persist: After each step, the new state is serialized and saved to the database as a new checkpoint. LangGraph doesn't overwrite the previous state. It keeps a history of checkpoints. This enables "time travel." You can tell an agent to return to an earlier state (if a tool call failed or the user wants to try a different path) and branch from there. This separates compute (the running agent) from memory (the database). Any worker node can pick up any task, making deployments stateless and scalable.
Step-by-Step: Setting up InMemory Checkpointer
For local testing, use the MemorySaver. This stores checkpoints in RAM. It's fast and requires no configuration, but clears when the process restarts. Good for prototyping and unit testing your graph logic.
1. Install LangGraph
pip install -U langgraph
2. Import and Initialize
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph
### Initialize the checkpointer
checkpointer = MemorySaver()
### Build your graph (simplified example)
builder = StateGraph(...)
### ... define nodes and edges ...
### Compile with the checkpointer
graph = builder.compile(checkpointer=checkpointer)
3. Run with a Thread ID
To use persistence, provide a config dictionary with a thread_id. This ID unlocks the agent's memory. ```python
config = {"configurable": {"thread_id": "user-123"}}
First interaction
response = graph.invoke({"messages": [("user", "Hi, I'm Tom")]}, config=config)
Second interaction - Agent remembers "Tom"
response = graph.invoke({"messages": [("user", "What is my name?")]}, config=config) ```
Production: Implementing Postgres Persistence
For production, MemorySaver won't work. You need a durable database that handles concurrent requests and provides long-term reliability. PostgreSQL is the common choice for LangGraph persistence. It supports JSONB for complex state objects and maintains transactional integrity.
1. Install Dependencies
pip install psycopg psycopg-pool langgraph-checkpoint-postgres
2. Configure the PostgresSaver
In production, use a connection pool to manage database resources. Use PostgresSaver for synchronous code or AsyncPostgresSaver for async applications. ```python
from langgraph.checkpoint.postgres import PostgresSaver
from psycopg_pool import ConnectionPool
DB_URI = "postgresql://user:pass@localhost:5432/db"
Create a connection pool context manager
with ConnectionPool(DB_URI) as pool: checkpointer = PostgresSaver(pool)
ONE-TIME SETUP: Create the necessary tables
checkpointer.setup()
Compile the graph
graph = builder.compile(checkpointer=checkpointer)
Run the graph
config = {"configurable": {"thread_id": "session-99"}}
graph.invoke(..., config=config)
**Important Note:** The `checkpointer.setup()` method must be called once to create the schema, including tables like `checkpoints` and `checkpoint_blobs`. In production deployments, handle this as part of a CI/CD migration script or a separate management command. Don't run it inside the primary application runtime. This prevents the app from trying to create tables that already exist or failing due to insufficient permissions during normal operation.
Handling Human-in-the-Loop Workflows
Persistence enables "Human-in-the-Loop" (HITL) patterns where an agent pauses for input. The state is saved to the database, so the human doesn't need to respond immediately. The agent can wait for minutes, hours, or days without consuming compute resources.
How it works:
- Define Interrupts: Set
interrupt_beforeorinterrupt_afterpoints during graph compilation:graph.compile(checkpointer=checkpointer, interrupt_before=["approval_node"]). 2. Execution Pause: Run the agent. It executes nodes sequentially until it reaches theapproval_node. Then it saves its state and stops. 3. Dormant State: The system sits idle in the database. No Python process needs to stay alive. 4. Human Review: A human operator reviews the current state via a dashboard or UI. They can see what the agent plans to do by inspecting the latest checkpoint. 5. State Modification (Optional): The human can usegraph.update_state()to modify the agent's memory or pending actions before it continues. 6. Resume: The human sends a "resume" command. The agent loads the updated state from Postgres and picks up where it stopped.
Managing Large Artifacts in Agent State
A common mistake is storing large binary artifacts (PDFs, high-resolution images, massive JSON datasets) directly in the LangGraph state.
The Problem with State Bloat: LangGraph creates a new checkpoint at every step. If your agent state includes a 50MB PDF and the agent takes 10 steps, the checkpointer writes 500MB of data to PostgreSQL. This causes database bloat, slower queries, and higher storage costs. It also slows down state loading.
The Recommended Reference Architecture:
Store large files in specialized storage (like Fast.io) and keep only the reference URL or metadata in the LangGraph state. 1. Ingest and Upload: When the agent receives a file, upload it to Fast.io using a secure API or MCP server. 2. Store Metadata: The agent stores a lightweight reference like {"file_url": "https://fast.io/share/xyz123", "filename": "report.pdf"} in its state. 3. Efficient Persistence: The checkpoint is tiny (a few hundred bytes), keeping your Postgres database lean and fast. 4. Just-in-Time Retrieval: When a node needs the file, it fetches the content using the URL or passes the URL to an LLM with vision or document analysis capabilities. This keeps your persistence layer performant while your agent retains access to unlimited data volumes.
Frequently Asked Questions
How do I save LangGraph state?
You save LangGraph state by passing a 'checkpointer' object (like MemorySaver or PostgresSaver) to the compile() method of your graph. You must also provide a unique thread_id in the config when invoking the graph.
What is the difference between MemorySaver and PostgresSaver?
MemorySaver stores checkpoints in the computer's RAM, so data is lost when the program stops. PostgresSaver stores checkpoints in a PostgreSQL database, allowing state to persist across restarts and long periods of time.
Can I persist large files in LangGraph state?
It is not recommended. Storing large files directly in state bloats your database because a new copy is saved at every step. Instead, store files in external storage like Fast.io and only save the file URL in the LangGraph state.
What is a thread_id in LangGraph?
A thread_id is a unique identifier used to group a series of interactions into a single conversation or task. It allows the checkpointer to retrieve the correct state for a specific user, ensuring that different sessions remain isolated and secure.
Can I go back to a previous state in LangGraph?
Yes, LangGraph persistence supports 'time travel.' By using the thread_id and a specific checkpoint_id, you can inspect previous states, fork the conversation, or even restart from a middle point if an error occurred.
Related Resources
Build Stateful Agents with Fast.io for langgraph persistence
Combine LangGraph's logic with Fast.io's storage. Keep your Postgres database clean by offloading file storage to our high-performance cloud.