How to Deploy Hermes Agent on Kubernetes
Hermes Agent's official docs cover six terminal backends, but Kubernetes is not one of them. A community-maintained Helm chart fills the gap with single-replica enforcement, PVC-backed state, and Istio-ready networking. This guide walks through installing the chart, configuring persistent storage, securing the deployment with RBAC and NetworkPolicy, running scheduled jobs alongside the gateway, and connecting external storage for file handoff.
Why Kubernetes for an Autonomous Agent
Two-thirds of organizations hosting generative AI models already use Kubernetes for some or all inference workloads, according to the CNCF 2025 Annual Cloud Native Survey. But inference is stateless. Autonomous agents like Hermes are not. They maintain persistent memory, accumulate skills across sessions, and run long-lived gateway processes that accept requests over HTTP. That mismatch between stateless orchestration defaults and stateful agent requirements is exactly what makes Kubernetes deployment worth documenting.
Nous Research Hermes Agent officially supports six terminal backends: local, Docker, SSH, Daytona, Singularity, and Modal. Kubernetes is not on that list. The v0.8.0 remote backend runs on a $5 VPS, which is a perfectly good option for a single agent. But if you already run a Kubernetes cluster, you get restart policies, resource limits, secret management, network policies, and observability for free. You also get CronJobs for scheduled automations without relying on the agent's internal scheduler, which can contend with the gateway process for shared state.
The community-maintained Helm chart at ultraworkers/hermes-agent-helm-chart packages Hermes for Kubernetes with opinionated defaults: single-replica enforcement when persistence is on, non-root execution, dropped Linux capabilities, and RuntimeDefault seccomp. This guide uses that chart.
Prerequisites and Cluster Setup
You need a running Kubernetes cluster (1.24+), Helm 3, kubectl configured for your cluster, and an API key for your LLM provider. The chart defaults to OpenRouter, so an OPENROUTER_API_KEY is the minimum credential. If you use Anthropic or OpenAI directly, you will pass that key instead.
For local development, MicroK8s or minikube both work. Mo Figueroa documented a complete MicroK8s deployment in April 2026, noting one important constraint: the MicroK8s local registry at localhost:32000 is node-local, so it only works for pods scheduled on the same node that runs the registry. If you are on a single-node cluster, that is fine. Multi-node clusters need a shared registry.
For production, any managed Kubernetes service works: EKS, GKE, AKS, or a self-managed cluster. The chart has no cloud-specific dependencies. You will want a StorageClass that supports ReadWriteOnce PersistentVolumeClaims, which every major provider includes by default.
Verify your cluster is ready:
kubectl cluster-info
helm version
Both commands should return without errors before proceeding.
Installing the Helm Chart
Clone the chart repository and install with minimal configuration. The chart is not published to a Helm registry, so you install from the local directory:
git clone https://github.com/ultraworkers/hermes-agent-helm-chart.git
cd hermes-agent-helm-chart
Create a values.yaml file with your API key and preferred model:
secrets:
OPENROUTER_API_KEY: sk-or-your-key-here
config:
values:
model:
default: anthropic/claude-sonnet-4-6
Install the chart into a dedicated namespace:
helm install hermes . \
--namespace hermes \
--create-namespace \
-f values.yaml
The chart deploys a single pod running hermes gateway run, which starts the agent as a background HTTP service. By default, persistence is enabled, which means the chart locks replicaCount to 1 and sets strategy.type to Recreate. This prevents two pods from writing to the same PersistentVolume simultaneously, which would corrupt Hermes' memory and skill state.
Verify the deployment:
kubectl get pods -n hermes
kubectl logs -n hermes deployment/hermes -f
You should see the gateway start and begin listening on port 8642.
Persist Hermes Agent output across sessions and teams
Free 50GB workspace with auto-indexing, MCP server access, and branded share links. Connect your Kubernetes-deployed agent to a storage layer humans can actually use, no credit card required.
How to Protect Agent State With Persistent Volumes
Hermes Agent stores everything under /opt/data: memory embeddings, learned skills, conversation history, configuration, and the SOUL.md personality file. Losing this directory means starting over. The Helm chart mounts a PVC at that path by default.
The chart's state safety guardrails deserve attention. When persistence.enabled is true (the default), two constraints activate automatically:
replicaCountlocks to 1, regardless of what you setstrategy.typeforces toRecreate, notRollingUpdate
The Recreate strategy means the old pod terminates completely before the new one starts. This creates a brief downtime during upgrades but prevents two pods from mounting the same volume. For an agent that writes to a SQLite database and flat files, this is the correct tradeoff.
If you need to bring your own storage, the chart accepts an existing PVC:
persistence:
enabled: true
existingClaim: my-hermes-pvc
For production, size the PVC generously. Hermes accumulates skills, memory entries, and conversation logs over time. Starting with 10Gi is reasonable for a single agent, but monitor usage as the agent learns. The vector database backing semantic memory can grow depending on how many documents the agent processes.
The bootstrap system seeds the initial state. Enable it to have Helm manage your config.yaml and SOUL.md:
bootstrap:
enabled: true
overwrite: true
With overwrite: true, every helm upgrade resets configuration to match your values file. This makes Helm the source of truth, which is useful for GitOps workflows where configuration lives in a repository.
Security, RBAC, and Network Policies
The chart ships with restrictive security defaults. The container runs as a non-root user with dropped Linux capabilities and a RuntimeDefault seccomp profile. The service account does not auto-mount the Kubernetes API token, which means the Hermes process cannot interact with the Kubernetes API unless you explicitly enable it.
Enable RBAC only if your Hermes agent needs to read Kubernetes resources, which is unusual for most deployments:
rbac:
create: true
serviceAccount:
create: true
automountServiceAccountToken: true
Network policies provide tenant-scoped isolation. When enabled, they restrict which pods can reach the Hermes service and which external endpoints Hermes can call:
networkPolicy:
enabled: true
The default policy allows ingress on the service ports and egress to the LLM provider endpoints. Customize the rules for your environment.
For secrets management, you have two paths. The simpler path puts API keys directly in the chart values, where they end up in a Kubernetes Secret. The production path uses an external secret store:
externalSecret:
enabled: true
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: platform-secrets
This works alongside the External Secrets Operator to pull credentials from AWS Secrets Manager, HashiCorp Vault, or whatever backend your cluster uses. The secret refreshes hourly, so key rotation does not require a redeployment.
For multi-tenant deployments, the chart recommends one Helm release per tenant with dedicated namespace, PVC, Secret, ServiceAccount, and ingress host. This provides hard isolation between agents that serve different teams or customers.
Scheduled Jobs and Service Exposure
Hermes Agent has built-in scheduled automations, but running them inside the gateway pod creates contention for shared state. A pattern that works better on Kubernetes is separating the stateful gateway from stateless CronJobs.
The gateway Deployment handles incoming requests and maintains persistent state. Kubernetes CronJobs handle scheduled tasks like daily briefings, data collection, or report generation. Each CronJob runs in an ephemeral container with an emptyDir volume, writes its output to a shared delivery queue on the PVC, and exits. The gateway picks up queued results on its next cycle.
This separation avoids the state contention that Mo Figueroa identified when running everything in a single pod: "Hermes-internal scheduling proved inadequate for Kubernetes deployments due to shared state contention with gateway."
To expose the gateway externally, enable the service and ingress:
apiServer:
enabled: true
host: 0.0.0.0
port: 8642
service:
enabled: true
ingress:
enabled: true
className: nginx
hosts:
- host: hermes.example.com
paths:
- path: /
pathType: Prefix
For Istio service mesh environments, use the VirtualService instead:
virtualService:
enabled: true
gateways:
- istio-system/public-gateway
hosts:
- hermes.example.com
timeout: 3600s
The 3600-second timeout is intentional. Agent conversations can run long, and default Istio timeouts (15 seconds) will kill active sessions.
How to Connect External Storage for File Handoff
Kubernetes gives Hermes a stable runtime, but the files an agent produces often need to reach people who do not have kubectl access. Local PVCs work for the agent's internal state. They do not work as a handoff mechanism for clients, collaborators, or downstream teams.
You have several options for bridging agent output to humans. S3-compatible storage works if your team already lives in AWS or MinIO. Google Drive and Dropbox work for smaller teams. For agent-specific workflows where files need to be indexed, searchable, and shareable with granular permissions, Fast.io provides workspaces designed for exactly this pattern.
Fast.io workspaces act as the persistent layer between agent and human. The agent writes files through the Fast.io MCP server or REST API, and humans access them through branded share links with audit trails. Intelligence Mode auto-indexes uploaded files for semantic search, so a team member can ask "what did the agent produce last week about Q3 projections?" and get cited answers without digging through folders.
The practical setup: mount your Hermes PVC for internal state, and configure the agent to upload deliverables to a Fast.io workspace. The MCP server exposes 19 consolidated tools for workspace operations, file management, and AI queries. Hermes can create workspaces, upload files, set permissions, and transfer ownership to a human when the work is done.
The free tier includes 50GB storage, 5,000 AI credits per month, and 5 workspaces with no credit card required. For a Kubernetes-deployed agent producing regular output, that covers a meaningful amount of file handoff before you hit any limits.
Other agent frameworks like OpenClaw use a similar pattern: the agent runs on infrastructure you control, but output flows to a shared workspace where humans can review, approve, and distribute it. The storage layer and the compute layer do not need to be the same system, and in most production setups, they should not be.
Frequently Asked Questions
Can you run Hermes Agent on Kubernetes?
Yes, using the community-maintained Helm chart at ultraworkers/hermes-agent-helm-chart. Nous Research does not officially support Kubernetes as a terminal backend, but the chart packages Hermes with cloud-native defaults including persistent storage, RBAC, NetworkPolicy, and Istio VirtualService support.
How do you scale Hermes Agent in production?
You do not scale a single Hermes instance horizontally. The Helm chart enforces replicaCount of 1 when persistence is enabled because the agent writes to a local SQLite database and flat files that cannot be safely shared across pods. To run multiple agents, deploy separate Helm releases with dedicated PVCs, secrets, and namespaces.
Is Docker or Kubernetes better for AI agents?
Docker is simpler for single-agent deployments. A docker-compose.yml with a volume mount and restart policy handles most cases. Kubernetes adds value when you need CronJob scheduling, network policies, secret rotation via External Secrets Operator, multi-tenant isolation, or integration with existing cluster observability. The v0.8.0 remote backend on a $5 VPS is also worth considering if you want something lighter than either option.
What is the Hermes Agent Helm chart?
The Hermes Agent Helm chart is a community-maintained Kubernetes package at github.com/ultraworkers/hermes-agent-helm-chart. It deploys Hermes as a single-replica Deployment with a PVC-backed home directory, optional bootstrap for config and SOUL.md, external secrets integration, and support for Kubernetes Ingress or Istio VirtualService. It is explicitly unofficial and not maintained by Nous Research.
Does Hermes Agent support multi-tenant Kubernetes deployments?
The Helm chart supports multi-tenant deployments through two approaches. In direct mode, you deploy one Helm release per tenant with a dedicated namespace, PVC, and secret. In operator-ready mode, the chart renders HermesTenant custom resources that a compatible controller can reconcile. Direct mode is simpler and recommended for most teams.
Related Resources
Persist Hermes Agent output across sessions and teams
Free 50GB workspace with auto-indexing, MCP server access, and branded share links. Connect your Kubernetes-deployed agent to a storage layer humans can actually use, no credit card required.