Is Hermes Agent safe to run?

Hermes Agent includes seven security layers by default: command approval, user authorization, container isolation, MCP credential filtering, context file scanning, cross-session isolation, and input sanitization. For development, the local backend with manual command approval provides reasonable safety. For production, switching to the Docker backend adds OS-level isolation that contains any destructive commands to the container filesystem.

How does Hermes Agent handle security?

Hermes applies defense-in-depth. Before any tool call executes, the system checks user authorization, scans the command against dangerous patterns, optionally runs it through the Tirith content scanner, and then executes it inside the configured backend (local, Docker, SSH, or cloud sandbox). Credentials are filtered from MCP subprocesses and redacted from error messages. Project context files are scanned for prompt injection before loading.

Can Hermes Agent run in a sandbox?

Yes. Hermes supports five sandboxed backends: Docker, Singularity, Modal, Daytona, and Vercel Sandbox. Docker is the most common choice for production. Containers run with all Linux capabilities dropped, no privilege escalation allowed, PID limits to prevent fork bombs, and separate noexec tmpfs mounts. When using any container backend, dangerous command approval checks are skipped because the container boundary provides the isolation.

How do I secure Hermes Agent in production?

Use the Docker backend with explicit user allowlists, never enable allow-all flags, store API keys in a chmod 600 .env file, set MESSAGING_CWD to a dedicated working directory, run as non-root, configure appropriate resource limits, and enable DM pairing instead of hardcoded user IDs. For additional isolation, use the SSH backend to separate the messaging gateway from the execution environment.

What is YOLO mode in Hermes Agent?

YOLO mode disables all dangerous command approval checks for the current session. You can activate it with the --yolo CLI flag, the /yolo slash command, or the HERMES_YOLO_MODE=1 environment variable. Even in YOLO mode, the hardline blocklist remains active, preventing operations like rm -rf / and fork bombs. YOLO mode is recommended only in trusted environments running well-tested automation scripts.

Does Hermes Agent protect against prompt injection?

Hermes scans project context files (AGENTS.md, .cursorrules, SOUL.md) for prompt injection patterns before loading them into the system prompt. It checks for instructions to ignore prior instructions, hidden HTML comments, attempts to read secrets, credential exfiltration commands, and invisible Unicode characters. Blocked files produce a warning instead of loading, and the Tirith scanner adds a second layer of pre-execution content analysis.

Hermes Agent Security Guide: Isolation and Authorization (2026)

How the Seven-Layer Defense Model Works

Nous Research Hermes Agent is an open-source (MIT-licensed) AI agent built for persistent, autonomous operation. That autonomy creates a real tension: the agent needs to run shell commands, install packages, and manage files, but those same capabilities can destroy a host system if left unchecked.

Hermes addresses this with seven overlapping security layers, applied in order during every tool call:

User authorization checks whether the person (or bot) sending a message is allowed to interact at all, using allowlists and a pairing-code system.
Dangerous command approval intercepts destructive operations like rm -rf or DROP TABLE before they execute.
Container isolation runs agent commands inside Docker, Singularity, Modal, or another sandboxed backend so mistakes stay contained.
MCP credential filtering strips API keys and tokens from environment variables passed to MCP subprocesses.
Context file scanning detects prompt injection attempts in project files like AGENTS.md or .cursorrules before loading them into the system prompt.
Cross-session isolation separates data and state between concurrent tasks.
Input sanitization validates working directories and parameters before execution.

This defense-in-depth approach means no single layer has to be perfect. A command that slips past approval still hits the container boundary. A prompt injection that evades content scanning still faces credential filtering. Each layer catches what the previous one missed.

Command Approval and the Hardline Blocklist

Before Hermes executes any shell command, it checks against a curated set of dangerous patterns. The approval system has three modes, configured via approvals.mode in ~/.hermes/config.yaml:

Manual (default): Every flagged command pauses and asks the user to approve or deny.
Smart: An auxiliary LLM assesses risk. Low-risk commands auto-approve, genuinely dangerous ones auto-deny, and uncertain cases escalate to the user.
Off: Disables all approval checks. Equivalent to running hermes --yolo or sending /yolo in chat.

The patterns that trigger approval cover the operations most likely to cause irreversible damage:

Recursive deletion (rm -r, find -delete, xargs rm)
SQL destructive operations (DROP TABLE, DELETE FROM without a WHERE clause, TRUNCATE)
Permission changes (chmod 777, chmod o+w, recursive chown to root)
Filesystem formatting (mkfs, dd if=/dev/zero)
System service manipulation (systemctl stop, systemctl disable)
Piping untrusted URLs to shell interpreters (curl | sh, bash <(curl ...))
Overwrites to sensitive paths (/etc/, ~/.ssh/, ~/.hermes/.env)

When a dangerous command is detected in the CLI, the user gets four options: approve once, approve for the session, add to a permanent allowlist, or deny. In messaging gateway mode (Telegram, Discord, Slack), the agent sends the command details to chat and waits for a yes/no reply. If no response arrives within the configured timeout (60 seconds by default), the command is denied. Fail-closed.

Beneath the configurable approval system sits a hardline blocklist that cannot be disabled by any setting, including YOLO mode. These are the "never under any circumstances" operations:

rm -rf / and its variants
Bash fork bombs (:(){ :|:& };:)
mkfs.* on mounted root devices
dd if=/dev/zero of=/dev/sd*

When a hardline-blocked command is attempted, Hermes returns an error message explaining why it was blocked, without executing anything.

Persist Hermes Agent outputs across container rebuilds

Free 50GB workspace with versioned storage, audit trails, and MCP access. No credit card, no trial expiration.

Start 14-Day Trial

Container Hardening and Backend Selection

Hermes supports seven deployment backends, each with a different isolation posture. The choice of backend determines whether you rely on command approval (software-level checks) or container boundaries (OS-level isolation) as your primary defense.

Backend	Isolation Level	Command Approval	Best For
local	None	Yes	Development and testing
ssh	Remote machine	Yes	Separate server
docker	Container	Skipped	Production gateway
singularity	Container	Skipped	HPC environments
modal	Cloud sandbox	Skipped	Scalable cloud
daytona	Cloud sandbox	Skipped	Persistent cloud
vercel_sandbox	Cloud microVM	Skipped	Cloud with snapshots

Container backends (Docker, Singularity, Modal, Daytona, Vercel Sandbox) skip dangerous command checks entirely because the container itself is the security boundary. If the agent runs rm -rf / inside a Docker container, it destroys the container filesystem, not your host.

Docker containers run with aggressive hardening by default:

All Linux capabilities dropped (--cap-drop ALL), with only DAC_OVERRIDE, CHOWN, and FOWNER re-added for package manager operations
No privilege escalation (--security-opt no-new-privileges)
PID limit of 256 to prevent fork bombs
Separate tmpfs mounts for /tmp (512MB, nosuid), /var/tmp (256MB, noexec), and /run (64MB, noexec)

Resource limits are configurable in ~/.hermes/config.yaml:

terminal:
  backend: docker
  docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
  container_cpu: 1
  container_memory: 5120
  container_disk: 51200
  container_persistent: true

The container_persistent flag controls whether workspace data survives between sessions. When set to true, Hermes bind-mounts /workspace and /root from ~/.hermes/sandboxes/docker/<task_id>/ on the host. When false, everything runs on tmpfs and disappears on cleanup. For production deployments that need durable output, persistent mode with external storage is the recommended pattern.

For teams running Hermes as a messaging gateway, the Docker backend combined with the SSH backend offers an extra isolation layer. You run the gateway process on one machine (handling Telegram, Discord, or Slack connections) while actual command execution happens on a separate server via SSH:

terminal:
  backend: ssh

This separates the messaging attack surface from the execution environment. Even if someone compromises the gateway, the agent's shell access lives on a different machine behind SSH key authentication.

Audit log showing agent command execution history

User Authorization and DM Pairing

When Hermes runs as a messaging gateway (accepting commands via Telegram, Discord, Slack, WhatsApp, or Signal), authorization controls who can talk to the bot. The system checks identity through a layered priority chain:

Per-platform allow-all flag (e.g., DISCORD_ALLOW_ALL_USERS=true)
DM pairing approved list
Platform-specific allowlists
Global allowlist (GATEWAY_ALLOWED_USERS)
Global allow-all (GATEWAY_ALLOW_ALL_USERS=true)
Default: deny

The simplest approach is hardcoding user IDs in ~/.hermes/.env:

TELEGRAM_ALLOWED_USERS=123456789,987654321
DISCORD_ALLOWED_USERS=111222333444555666
SLACK_ALLOWED_USERS=U01ABC123

But for teams where membership changes, the DM pairing system is more practical. When an unknown user messages the bot, they receive an 8-character pairing code. The bot owner approves the code via CLI (hermes pairing approve telegram ABC12DEF), and the user is permanently authorized for that platform.

The pairing system follows OWASP and NIST SP 800-63-4 guidelines:

Codes use a 32-character unambiguous alphabet (no 0/O or 1/I confusion)
Generated with secrets.choice() for cryptographic randomness
Each code expires after 1 hour
Rate limited to 1 request per user per 10 minutes
Maximum 3 pending codes per platform at any time
5 failed attempts trigger a 1-hour lockout
Pairing data stored with chmod 0600 permissions
Codes are never logged to stdout

Revoking access is equally straightforward: hermes pairing revoke telegram 123456789 removes a user, and hermes pairing clear-pending wipes all pending codes.

Credential Filtering and SSRF Protection

Even inside a hardened container, the agent interacts with external services through MCP servers, web requests, and environment variables. Hermes applies credential controls at each boundary.

MCP environment isolation. When Hermes launches an MCP subprocess, only safe system variables pass through: PATH, HOME, USER, LANG, LC_ALL, TERM, SHELL, TMPDIR, and XDG_* prefixed variables. Every API key, token, and secret is stripped. Services that need credentials get them through explicit per-server configuration:

mcp_servers:
  github:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-github"]
    env:
      GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_..."

This prevents a compromised or poorly written MCP server from harvesting credentials meant for other services.

Credential redaction. When tool execution produces error messages, Hermes scrubs them before returning content to the LLM. Patterns matching GitHub PATs (ghp_...), OpenAI-style keys (sk-...), Bearer tokens, and common parameter names (token=, key=, API_KEY=, password=, secret=) are all redacted. The LLM never sees raw credentials in error output.

SSRF protection. Every URL the agent fetches is validated against a blocklist of private and internal addresses before the request fires:

Private networks (RFC 1918): 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
Loopback: 127.0.0.0/8 and ::1
Link-local: 169.254.0.0/16, including cloud metadata endpoints at 169.254.169.254
CGNAT ranges (RFC 6598): 100.64.0.0/10, covering Tailscale and WireGuard networks
Cloud metadata hostnames: metadata.google.internal, metadata.goog

DNS failures are treated as blocked (fail-closed), and redirect chains are re-validated at each hop. This prevents the agent from being tricked into hitting internal services or cloud metadata APIs through DNS rebinding or redirect chains.

For legitimate internal network access (home labs, LAN-only Ollama instances), you can set security.allow_private_urls: true in the config. But for any public-facing gateway deployment, leave this off.

Website blocklist. You can also block specific domains from agent access:

security:
  website_blocklist:
    enabled: true
    domains:
      - "*.internal.company.com"
      - "admin.example.com"

This applies across all URL-capable tools: web search, content extraction, and browser navigation.

Pre-Exec Scanning and Prompt Injection Defense

Two additional security layers protect against attacks that target the LLM itself rather than the host system.

Tirith pre-execution scanning. Hermes integrates Tirith, a content-level scanner that checks commands before execution. Tirith detects homograph URL spoofing (internationalized domain attacks that make malicious URLs look legitimate), pipe-to-interpreter patterns (curl | bash), and terminal injection attacks. The scanner auto-installs from GitHub releases with SHA-256 checksum verification.

When Tirith flags a command, the finding integrates directly with the approval flow. The user sees the severity, a description of the threat, and suggested safer alternatives. The default action for unattended execution is deny.

You can configure Tirith's behavior in ~/.hermes/config.yaml:

security:
  tirith_enabled: true
  tirith_timeout: 5
  tirith_fail_open: true

The tirith_fail_open setting controls what happens when Tirith itself is unavailable (crashed, not installed, timed out). The default (true) lets commands proceed. In high-security environments, set this to false so a missing scanner blocks all execution.

Context file injection protection. Before Hermes loads project context files (AGENTS.md, .cursorrules, SOUL.md) into the system prompt, it scans them for prompt injection patterns:

Instructions to ignore prior instructions
Hidden HTML comments containing suspicious keywords
Attempts to read secrets (.env, credentials, .netrc)
Credential exfiltration via curl
Invisible Unicode characters (zero-width spaces, bidirectional text overrides)

Blocked files produce a clear warning instead of silently loading: [BLOCKED: AGENTS.md contained potential prompt injection (prompt_injection). Content not loaded.]

This matters because project context files are often contributed by multiple team members or pulled from external repositories. A single malicious AGENTS.md file could otherwise instruct the agent to exfiltrate API keys or execute destructive commands. Scanning these files before they reach the LLM closes that attack vector.

Storing Security Audit Trails

For teams running Hermes in production, the security logs at ~/.hermes/logs/ capture authorization attempts, command approvals, and blocked operations. But those logs live on the agent's host and can be lost during container restarts or infrastructure changes.

Pairing Hermes with an external workspace platform solves the persistence problem. Fastio provides persistent, versioned storage that agents can write to through the MCP server or REST API. Security logs, audit trails, and agent outputs persist across sessions and container rebuilds. The workspace's built-in Intelligence indexing means you can search audit logs semantically ("show me all denied commands from last week") rather than grepping through raw text files.

The Business Trial includes 50GB of storage, included credits, and 5 workspaces with no credit card required. For a Hermes deployment that generates security-sensitive output, having an independent storage layer that survives container teardowns is worth the five-minute setup.

AI-powered audit trail and security summary interface

How to Secure Hermes Agent in Production

A secure Hermes deployment touches every layer of the defense model. Here is the recommended configuration for a production gateway:

Authorization. Set explicit allowlists for every messaging platform you expose. Never use GATEWAY_ALLOW_ALL_USERS=true in production. Enable DM pairing so new team members can self-onboard without you editing config files, and periodically audit the approved users list with hermes pairing list.

Execution backend. Use terminal.backend: docker for production gateways. Set CPU, memory, and disk limits appropriate for your workload. The defaults (1 CPU, 5GB RAM, 50GB disk) are reasonable starting points, but memory-intensive tasks like browser automation may need more headroom.

Credential hygiene. Store API keys in ~/.hermes/.env and set chmod 600 on the file. Never commit .env files to version control. Use per-server MCP environment configuration instead of global environment variables. Review the command_allowlist in your config periodically to make sure you haven't permanently approved something you shouldn't have.

Working directory. Set the MESSAGING_CWD environment variable to a dedicated directory. This prevents the agent from operating in sensitive locations like your home directory or a production checkout.

Process user. Run the gateway process as a non-root user. The official Docker image already defaults to UID 10000, but if you are using the local or SSH backend, create a dedicated hermes user with limited filesystem access.

External storage. Container filesystems are ephemeral by design. For agent outputs that need to survive container rebuilds, send them to an external workspace. This applies to generated reports, processed files, and anything a human needs to review later. S3, Google Cloud Storage, or a workspace platform like Fastio all work. Fastio adds the advantage of built-in file versioning, audit trails, and ownership transfer so you can hand off agent-created workspaces to clients or team members.

Updates. Run hermes update regularly. Security patches for the approval patterns, Tirith scanner, and SSRF blocklists ship through the standard update channel.

Monitoring. Check ~/.hermes/logs/ for unauthorized access attempts, repeated command denials, and pairing code abuse. Set up external log shipping if your team has a centralized logging platform.

Hermes Agent Security: Container Isolation, Authorization, and Safe Deployment

How the Seven-Layer Defense Model Works

Command Approval and the Hardline Blocklist

Persist Hermes Agent outputs across container rebuilds

Container Hardening and Backend Selection

User Authorization and DM Pairing

Credential Filtering and SSRF Protection

Pre-Exec Scanning and Prompt Injection Defense

Storing Security Audit Trails

How to Secure Hermes Agent in Production

Frequently Asked Questions

Related Resources

Persist Hermes Agent outputs across container rebuilds