How to Configure Persistent Memory in Hermes Agent
Hermes Agent ships with a dual-layer memory system: bounded local files for curated facts and FTS5 full-text search across every past session. This guide walks through both built-in memory and the eight pluggable external providers, so you can pick the right persistence strategy for your deployment.
How Hermes Agent Memory Works
Most AI agents forget everything when a session ends. Hermes Agent takes a different approach: it maintains two bounded files that persist across every conversation, plus a searchable archive of every past session stored in SQLite.
The built-in memory layer uses two files stored in ~/.hermes/memories/:
- MEMORY.md (roughly 800 tokens): Agent-managed observations about your environment, project conventions, discovered workarounds, and task completion records.
- USER.md (roughly 500 tokens): Your communication preferences, identity details, and workflow patterns.
Both files inject into the system prompt as a frozen snapshot at session start, which preserves LLM prefix cache performance. Changes persist to disk immediately but only affect subsequent sessions.
The agent does not wait for you to say "remember this." It proactively saves durable knowledge through periodic nudges, writing facts it determines are worth keeping. It also rejects trivial data, easily searchable facts, large code blocks, and anything ephemeral to a single session.
Capacity is intentionally small. MEMORY.md caps at 2,200 characters and USER.md at 1,375 characters. When approaching those limits, the agent consolidates entries by merging related facts before adding new ones. This keeps memory lean and relevant rather than accumulating noise over months of use.
Hermes Built-in Memory: state.db and FTS5
Beyond the two curated files, Hermes stores every CLI and messaging session in a SQLite database at ~/.hermes/state.db. This database uses FTS5 full-text indexing, which means the agent can search its own conversation history using natural language queries.
When you ask something like "what did we decide about the deployment config last week?", the agent queries the FTS5 index, retrieves matching session fragments, and passes them through Gemini Flash summarization. The result is a concise answer pulled from past conversations without loading entire session transcripts into context.
This matters for practical reasons. The curated memory files hold roughly 1,300 tokens combined. That is enough for key facts and preferences, but not for recalling the full reasoning behind a decision you made three weeks ago. FTS5 search fills that gap by making the complete session archive queryable on demand.
The search is fast even across thousands of past messages because FTS5 operates at the SQLite level, not through LLM inference. Only the matching fragments go through summarization, which keeps token costs low and latency reasonable.
No configuration is needed to enable session search. It runs automatically as long as you have not deleted the state database. If you want to verify it is working, you can ask the agent to recall a specific past conversation topic and check whether it retrieves relevant context.
Choosing a Memory Provider
Built-in memory covers most single-user workflows, but Hermes also supports eight external memory providers for teams that need unbounded storage, semantic search, or multi-agent coordination. Only one external provider can be active at a time, and it runs alongside the built-in MEMORY.md and USER.md files.
Here is how the providers compare:
Honcho Type: Cloud (self-hostable). Best for: Multi-agent systems that need cross-session user modeling. Honcho uses dialectic reasoning to build a deepening model of each user over time, with configurable reasoning depth (1 to 3 passes). It supports multi-peer setups where different agents share observations about the same user.
OpenViking Type: Self-hosted (local or cloud). Best for: Teams that want a filesystem-style knowledge hierarchy. OpenViking organizes memories into six categories with tiered retrieval (L0 at roughly 100 tokens up to L2 with full context). Free under AGPL-3.0.
Mem0
Type: Cloud only.
Best for: Hands-off setups where you want automatic fact extraction without managing infrastructure. Mem0 handles deduplication and reranking server-side. Requires pip install mem0ai and an API key.
Hindsight
Type: Cloud or local (embedded PostgreSQL).
Best for: Knowledge graph use cases with entity resolution. The unique hindsight_reflect tool lets the agent synthesize connections across stored memories. Configuration lives in $HERMES_HOME/hindsight/config.json.
Holographic Type: Local SQLite. Best for: Offline or air-gapped deployments. Uses Holographic Reduced Representations (HRR) for compositional algebraic queries. Free with optional NumPy dependency.
RetainDB Type: Cloud API. Best for: Production deployments needing hybrid search (Vector + BM25 + reranking) across seven memory types with delta compression. Paid plans available; check the vendor for current pricing.
ByteRover
Type: Local-first CLI with optional cloud sync.
Best for: Developers who prefer a hierarchical knowledge tree. Install with npm install -g byterover-cli. Data stored in $HERMES_HOME/byterover/, scoped per profile.
Supermemory
Type: Cloud API.
Best for: Long-term semantic memory with context fencing that prevents recursive memory pollution. Uses profile-scoped containers and ingests full conversations at session end. Requires pip install supermemory and an API key.
Give Your Hermes Agent a Persistent Workspace
Pair Hermes memory with Fast.io workspaces for file persistence, semantic search, and human handoff. 50GB free storage, no credit card required.
Setting Up an External Memory Provider
Configuration takes three steps: install the provider, set credentials, and activate it in the Hermes config file.
Interactive setup
The fast path is the guided wizard:
hermes memory setup
This walks through provider selection, credential entry, and writes everything to ~/.hermes/config.yaml automatically.
Manual setup
If you prefer editing config files directly, open ~/.hermes/config.yaml and set the provider name:
memory:
provider: honcho
Then add provider-specific credentials to your profile's .env file. Each provider uses its own environment variable:
MEM0_API_KEY=your-key
OPENVIKING_ENDPOINT=http://localhost:1933
HINDSIGHT_API_KEY=your-key
SUPERMEMORY_API_KEY=your-key
RETAINDB_API_KEY=your-key
Verify the provider is active
hermes memory status
This prints the active provider name, connection status, and current memory count.
Disable an external provider
hermes memory off
This deactivates the external provider while keeping built-in MEMORY.md and USER.md intact.
Once active, the provider automatically injects relevant context into the system prompt, prefetches memories before each turn, syncs conversation data after responses, and extracts durable memories at session end. It also mirrors any writes to the built-in memory files, so you get both local and external persistence without managing them separately.
Profile Isolation and Multi-Agent Deployments
Hermes scopes all memory data by profile, which matters when you run multiple agents or serve different users from the same installation.
For local providers like Holographic and ByteRover, data lives in profile-specific subdirectories under $HERMES_HOME/. Each profile gets its own SQLite database, its own knowledge tree, and its own credential files. There is no cross-contamination between profiles.
Cloud providers handle isolation differently. Honcho derives project names from the profile, so each profile's memories stay in separate cloud containers. Supermemory uses an {identity} template to scope containers per profile. Mem0 and RetainDB use profile-specific API key files, so each profile authenticates to its own cloud namespace.
This profile isolation is what makes Hermes viable for multi-agent deployments. You can run a research agent and a writing agent on the same machine, each with its own memory provider and its own accumulated knowledge, without either agent accessing the other's data.
For teams coordinating multiple Hermes instances, Honcho's multi-peer setup is worth evaluating. It lets different agents share observations about the same user while maintaining separate memory stores, which is useful when a customer-facing agent and a backend agent both need context about the same person.
When agents produce files that need to outlive any single session, a persistent workspace becomes essential. Fast.io provides shared workspaces where agents write outputs, humans review them, and everything stays versioned and searchable. The free agent plan includes 50GB of storage, 5,000 monthly credits, and 5 workspaces with no credit card required. Agents access workspaces through the Fast.io MCP server or REST API, while humans use the same workspaces through the web UI.
Practical Patterns for Memory Configuration
The right memory setup depends on your deployment scenario. Here are tested configurations for common use cases.
Solo developer, single machine
Stick with built-in memory. MEMORY.md and USER.md handle personal preferences and project conventions without any external dependencies. FTS5 session search gives you cross-session recall for free. No API keys, no cloud services, no monthly costs.
Team with shared context needs
Use Honcho or Hindsight. Both support cross-session context that multiple agents or team members can contribute to. Honcho's dialectic modeling deepens understanding of each user over time, while Hindsight's knowledge graph is better for environments where entity relationships matter (connecting a bug report to a deploy to a config change).
Offline or air-gapped environments
Use Holographic. It runs entirely on local SQLite with no network calls. The HRR query system works without an embedding API, though it benefits from NumPy if available. ByteRover also works offline in its default local mode.
High-volume production systems
RetainDB's hybrid search handles large memory stores well because it combines vector similarity, BM25 keyword matching, and a reranking layer. Delta compression keeps storage costs manageable as conversation volume grows.
Minimal management overhead
Mem0 extracts and deduplicates facts server-side, so you do not need to tune extraction rules or manage consolidation. Set the API key and the provider handles the rest.
Regardless of which provider you choose, the built-in MEMORY.md and USER.md files keep working. External providers add depth, but the curated local files remain the fast retrieval path because they inject directly into the system prompt at session start.
For file outputs that need to persist beyond memory, consider pairing your Hermes deployment with Fast.io workspaces. Agents can write research, reports, or generated assets to a shared workspace where the files are automatically indexed for semantic search through Intelligence Mode, then hand off the workspace to a human reviewer when the work is done.
Frequently Asked Questions
How does Hermes Agent remember things between sessions?
Hermes maintains two bounded text files (MEMORY.md and USER.md) in ~/.hermes/memories/ that persist across sessions. The agent proactively writes durable facts to these files through periodic nudges. It also stores every session in a SQLite database with FTS5 full-text search, so it can retrieve and summarize past conversations on demand.
What memory providers does Hermes Agent support?
Hermes supports eight external memory providers: Honcho (cross-session user modeling), OpenViking (filesystem-style hierarchy), Mem0 (automatic fact extraction), Hindsight (knowledge graph), Holographic (local SQLite with HRR queries), RetainDB (hybrid cloud search), ByteRover (hierarchical knowledge tree), and Supermemory (semantic memory with context fencing). Only one can be active at a time alongside built-in memory.
How do I configure Honcho memory in Hermes Agent?
Run 'hermes memory setup' and select Honcho, or manually set 'memory.provider: honcho' in ~/.hermes/config.yaml. You will need a Honcho API key, which you can get from their cloud service or by self-hosting. Honcho is particularly useful for multi-agent systems because it supports cross-peer reasoning and dialectic user modeling with configurable reasoning depth.
Can Hermes Agent search its own conversation history?
Yes. Every session is stored in ~/.hermes/state.db with FTS5 full-text indexing. When the agent needs historical context, it searches the index and passes matching fragments through Gemini Flash summarization. This works across thousands of past sessions with low latency because the search runs at the SQLite level.
What is the difference between built-in memory and external providers?
Built-in memory uses two small text files (about 1,300 tokens combined) that inject into the system prompt for fast retrieval. External providers offer unbounded storage, semantic search, knowledge graphs, or team-shared context. Both run simultaneously when a provider is active, with writes mirrored to both systems.
Does Hermes Agent memory work offline?
The built-in MEMORY.md, USER.md, and FTS5 session search all work fully offline. For external providers, Holographic (local SQLite) and ByteRover (local-first CLI) work without network access. Cloud providers like Mem0, RetainDB, and Supermemory require an internet connection.
Related Resources
Give Your Hermes Agent a Persistent Workspace
Pair Hermes memory with Fast.io workspaces for file persistence, semantic search, and human handoff. 50GB free storage, no credit card required.