AI & Agents

How to Monitor Hermes Agent: Observability and Tracing Setup

Hermes Agent ships with a layered observability stack that most agent frameworks lack out of the box. This guide walks through all four layers: built-in JSONL trajectory logging, Langfuse tracing for per-turn LLM visibility, Portkey for cost tracking and provider management, and the hermes-labyrinth plugin for read-only journey inspection. You will finish with a working setup that captures every tool call, model switch, and token spend.

Fast.io Editorial Team 13 min read
Audit log interface showing detailed activity tracking and event history

What Are the Four Layers of Hermes Agent Observability

Most AI agent observability guides describe a single integration, usually OpenTelemetry or a vendor SDK. Hermes Agent from Nous Research takes a different approach by stacking four complementary layers, each solving a distinct problem.

  1. Built-in JSONL trajectory logging records every completed conversation as an append-only file. This is your post-incident replay source and training data pipeline.
  2. Langfuse tracing captures per-turn spans, per-call generations, and per-tool observations with cost and token breakdowns. This is your debugging microscope.
  3. Portkey gateway monitoring tracks 40+ metrics including cost, token usage, and response time across 200+ LLM providers. This is your budget guardrail and provider failover layer.
  4. hermes-labyrinth plugin provides a read-only dashboard for journey inspection, skill diagnostics, and exportable reports. This is your operator control plane.

Each layer can run independently. You can start with just trajectory logging (it is on by default when configured) and add Langfuse and Portkey as your deployment matures. The rest of this guide covers setup and configuration for each layer, starting with what ships built in.

Built-in Trajectory Logging and the RedactingFormatter

Hermes Agent's most underrated observability feature is its JSONL trajectory system. Every completed conversation gets saved as a single JSON object containing the full turn-by-turn transcript, timestamp, model identifier, and completion status.

The system routes trajectories to separate files based on outcome:

  • trajectory_samples.jsonl captures conversations that completed successfully
  • failed_trajectories.jsonl captures conversations that failed or were interrupted

This separation matters for two reasons. First, you can analyze failure patterns without filtering through thousands of successful runs. Second, successful trajectories double as training data in ShareGPT-compatible format, ready for fine-tuning workflows.

Each trajectory entry follows a consistent structure:

{
  "conversations": [
    {"from": "system", "value": "..."},
    {"from": "human", "value": "..."},
    {"from": "gpt", "value": "..."}
  ],
  "timestamp": "2026-05-17T14:30:00Z",
  "model": "claude-sonnet-4-6",
  "completed": true
}

Tool calls get converted to XML-wrapped JSON with parsed arguments, and tool responses are grouped into single turns with results joined by newlines. Reasoning tokens from any provider get normalized into <think> tags, regardless of whether the model uses native thinking tokens or system-prompted XML.

Enable trajectory saving through YAML config or CLI flag:

### In ~/.hermes/config.yaml
agent:
  save_trajectories: true

### Or via CLI
hermes --save-trajectories

Security at the log layer. Hermes uses a RedactingFormatter that strips API keys and tokens before any log entry hits disk. The dual-file logging system writes INFO+ events to agent.log and WARNING+ events to errors.log, both using RotatingFileHandler with the redaction formatter applied. You never need to worry about credentials leaking into trajectory files or log archives.

AI-powered document analysis showing structured data extraction and audit trails

Langfuse Tracing for Per-Turn Visibility

Trajectory files show you what happened. Langfuse shows you why it happened and how long each step took.

The observability/langfuse plugin creates structured traces with three levels of granularity: one span per agent turn, one generation per LLM API call, and one tool observation per tool invocation. This hierarchy lets you drill from a high-level session view down to individual API latencies and token counts.

Installation and Configuration

Install the Langfuse SDK and enable the plugin:

pip install langfuse
hermes plugins enable observability/langfuse

Add your credentials to ~/.hermes/.env:

HERMES_LANGFUSE_PUBLIC_KEY=pk-lf-...
HERMES_LANGFUSE_SECRET_KEY=sk-lf-...
HERMES_LANGFUSE_BASE_URL=https://cloud.langfuse.com

Hermes accepts both HERMES_LANGFUSE_* and standard LANGFUSE_* environment variables, so if you already have Langfuse configured for another project, the plugin picks up your existing credentials.

How the Plugin Traces Execution

The plugin hooks into four points in the agent loop:

  • pre_api_request opens a root span labeled "Hermes turn"
  • post_api_request closes the generation and attaches usage metrics and cost details
  • pre_tool_call starts a tool observation with sanitized arguments
  • post_tool_call closes the observation with result data

Usage metrics align with Hermes' canonical agent.usage_pricing numbers, so the Langfuse dashboard shows the same input/output/cache token breakdown that appears in hermes logs. Sessions are grouped using the Hermes session ID through langfuse.propagate_attributes, which means multi-turn conversations appear as a single trace tree.

Tuning and Sampling

For high-volume deployments, you probably don't need to trace every single turn. The plugin supports several tuning variables:

  • HERMES_LANGFUSE_SAMPLE_RATE: Set to 0.1 to trace 10% of sessions (default: 1.0)
  • HERMES_LANGFUSE_MAX_CHARS: Cap the size of traced tool results (default: 12000). Large file reads get summarized instead of logged verbatim.
  • HERMES_LANGFUSE_ENV: Tag traces with an environment like production or staging
  • HERMES_LANGFUSE_RELEASE: Tag traces with a version for deployment correlation

The plugin is fail-open by design. If the Langfuse SDK is missing, credentials are wrong, or the Langfuse server is down, the plugin silently skips tracing. The agent loop is never blocked or slowed by observability failures.

Verify the integration is active:

hermes plugins list

Look for observability/langfuse in the enabled plugins list.

Fastio features

Persist Hermes Agent observability artifacts across sessions

Free 50GB workspace with auto-indexing. Upload trajectory files and log exports, then search them by meaning. No credit card, MCP-ready endpoint for agent uploads.

Portkey Gateway for Cost Tracking and Provider Management

Langfuse answers "what did the agent do?" Portkey answers "how much did it cost and which provider served the request?"

Portkey sits between Hermes and your LLM providers as an OpenAI-compatible gateway. Every API call passes through Portkey, which logs it, measures it, and applies any budget or rate controls you've configured. This is especially important for Hermes because the agent runs unattended via cron jobs, messaging gateways (Telegram, Discord, Slack, WhatsApp), and scheduled automations where a runaway session could burn through your budget before anyone notices.

Setup

Portkey configuration replaces the LLM provider endpoint in Hermes with the Portkey gateway URL:

hermes config set model.base_url https://api.portkey.ai/v1
hermes config set model.default "@openai-prod/gpt-4o"

The @<provider-slug>/<model-name> format lets you define named provider routes. Add your Portkey API key and virtual key configuration through the Portkey dashboard, then reference them in ~/.hermes/config.yaml.

What Portkey Tracks

Portkey monitors 40+ metrics across every request, including:

  • Cost analysis with per-request and cumulative spend breakdowns
  • Token usage split by input, output, and cache tokens
  • Response time distributions with percentile analysis
  • Error rates by provider, model, and status code
  • Request logs with complete request/response pairs and custom metadata tags

The dashboard gives you a single view across all providers, which matters when Hermes switches models mid-session using /model commands.

Budget Controls

Set spending limits to prevent runaway costs:

  • Monthly cost caps (e.g., $200/month for a production agent)
  • Token limits (e.g., 10M tokens/week)
  • Rate limits (requests per minute per provider)

When a threshold is reached, Portkey blocks the request rather than letting it through. For agents running overnight or handling automated workflows, this is the difference between waking up to a $50 bill and a $5,000 bill.

Failover and Load Balancing

Portkey also handles reliability concerns that most teams build ad-hoc:

  • Fallbacks route to backup providers when the primary returns errors
  • Load balancing distributes requests across multiple API keys with weighted distribution
  • Caching reduces costs for repeated queries, which is common with scheduled tasks that re-process similar prompts
  • Retries handle transient failures automatically based on status codes

Since Portkey supports 200+ LLMs and over 1,600 models, you can switch between Anthropic, Google, Mistral, Azure OpenAI, and others without changing Hermes configuration beyond the provider slug.

The hermes-labyrinth Plugin for Journey Inspection

The three layers above capture data. hermes-labyrinth turns that data into something an operator can browse.

hermes-labyrinth is a community-built, read-only observability plugin that functions as a "black-box recorder for agents moving through unknown work." It doesn't modify agent behavior. It watches, records, and organizes what the agent does into navigable structures.

Installation Clone the plugin into your Hermes plugins directory:

mkdir -p ~/.hermes/plugins
git clone https://github.com/stainlu/hermes-labyrinth.git \
  ~/.hermes/plugins/hermes-labyrinth

Access it through the Hermes web dashboard at the Labyrinth tab after running hermes dashboard.

Core Concepts

The plugin organizes observability data around four abstractions:

Journeys are indexed records of recent agent activity across CLI, dashboard, gateway, cron, and delegated work. Think of a journey as a single session or task execution from start to finish.

Crossings are the ordered steps within a journey. Each crossing captures prompts, tool calls, tool results, failures, model switches, subagent delegations, approvals, memory operations, redactions, context compressions, and cron runs.

Guideposts are inspector views that show input, output, duration, status, and evidence for any individual crossing. When you need to understand exactly why a tool call failed or how long a specific model call took, the guidepost gives you the drill-down.

Reports are exportable Markdown or JSON summaries of complete journeys, with secret redaction applied automatically.

Additional Features

Beyond journey tracking, hermes-labyrinth provides:

  • A skill atlas showing which skills are active, which are shadowed by overrides, and duplicate diagnostics
  • A cron gate for monitoring scheduled automations
  • A model ferry that tracks model transitions across sessions, useful when Hermes switches providers mid-conversation

API Access

The plugin exposes read-only endpoints for programmatic access:

  • /api/plugins/hermes-labyrinth/journeys lists recent journeys
  • /api/plugins/hermes-labyrinth/skills shows skill diagnostics
  • /api/plugins/hermes-labyrinth/cron displays scheduled job status
  • /api/plugins/hermes-labyrinth/reports/{journey_id}.json exports journey data

All endpoints apply the same secret redaction that the UI uses. If the redaction tools are unavailable, the plugin fails securely rather than exposing raw data.

Neural network visualization showing interconnected data nodes and processing layers

How to Persist Observability Artifacts Across Deployments

Hermes Agent runs on a server, whether local, Docker, SSH, or a cloud provider like Modal. Trajectory files, log archives, and exported reports accumulate on that server's filesystem. This works fine for a single instance, but problems emerge quickly when you run multiple Hermes deployments, need to share incident data with teammates, or want to preserve observability artifacts beyond the server's lifecycle.

Local storage is the default and works for development. For production Hermes deployments, you need the observability data somewhere accessible and durable.

Options for persisting observability artifacts include cloud object storage like S3 or GCS, network-attached volumes if you're running in a container, or a shared workspace platform. Fast.io provides a workspace approach that works well for agent-generated artifacts: upload trajectory files, log exports, and hermes-labyrinth reports to a shared workspace where both the agent and human operators can access them.

The practical setup looks like this:

  • Hermes writes trajectory files and log archives to its local filesystem
  • A post-session script or cron job uploads artifacts to a persistent workspace
  • Human operators browse, search, and download artifacts through the workspace UI
  • When Intelligence Mode is enabled on the workspace, uploaded files are automatically indexed for semantic search, so you can ask questions like "show me all failed trajectories from last Tuesday that involved the GitHub tool"

Fast.io's free agent tier includes 50GB of storage, 5,000 AI credits per month, and 5 workspaces with no credit card required. The MCP server at /mcp gives agents programmatic access to upload, organize, and query workspace contents. For teams running multiple Hermes instances across different tasks, a shared workspace becomes the central observability archive that persists regardless of which server instances are running.

Other viable options include syncing trajectory files to an S3 bucket with lifecycle policies, mounting a shared NFS volume across instances, or building a custom pipeline that ships logs to your existing ELK or Grafana stack. The right choice depends on your existing infrastructure and how many people need access to the data.

How to Roll Out Each Observability Layer Step by Step

Here's the practical order for setting up Hermes Agent observability from scratch.

Start with trajectory logging. Enable save_trajectories: true in your config. This costs nothing, runs locally, and gives you the raw conversation data you need for debugging and training. The RedactingFormatter handles secret stripping automatically.

Add Langfuse when you need debugging depth. Once you start running multi-turn sessions or complex tool chains, trajectory files alone won't tell you where time is being spent. Langfuse's per-turn spans and per-call generations give you the waterfall view you need to identify bottlenecks. Start with a HERMES_LANGFUSE_SAMPLE_RATE of 1.0 and lower it once you have enough baseline data.

Add Portkey when costs matter. If you're running Hermes on a schedule, through messaging gateways, or across multiple providers, Portkey's budget controls and failover routing prevent the kind of cost surprises that come with unattended agent operations. The 40+ metrics dashboard gives you the provider-level visibility that Langfuse doesn't cover.

Install hermes-labyrinth when you need operator tooling. The plugin is most valuable when non-developers need to inspect agent behavior, review journey paths, or export incident reports. Its read-only design means there's zero risk of the observability layer affecting agent behavior.

For ongoing telemetry improvements, keep an eye on GitHub issue #6741, which proposes structured session tracing with start/end timestamps and parent-child span relationships. This would add timing instrumentation for the prompt build, model call, tool dispatch, and response phases, enabling performance waterfall analysis that currently requires combining data from Langfuse and trajectory files manually.

Frequently Asked Questions

How do I monitor Hermes Agent performance?

Enable Langfuse tracing with `hermes plugins enable observability/langfuse` and add your Langfuse credentials to `~/.hermes/.env`. The plugin creates per-turn spans and per-call generations that show latency, token usage, and cost for every step. For provider-level performance data across multiple LLMs, route requests through the Portkey gateway, which tracks 40+ metrics including response time distributions.

Does Hermes Agent support Langfuse?

Yes. Hermes ships a built-in `observability/langfuse` plugin that traces every agent turn, LLM call, and tool invocation. Install it with `pip install langfuse` and `hermes plugins enable observability/langfuse`. The plugin uses four hooks (pre/post API request and pre/post tool call) to create structured traces. It's fail-open, so if Langfuse is unreachable, the agent continues without interruption.

How do I track Hermes Agent costs across providers?

Configure Portkey as a gateway by setting `model.base_url` to `https://api.portkey.ai/v1` in your Hermes config. Portkey logs every request with cost, token, and timing data across 200+ LLM providers. You can set monthly cost caps, token limits, and rate limits. When Hermes switches providers mid-session with `/model` commands, Portkey tracks spending across all of them in a single dashboard.

What metrics does Hermes Agent log?

At the trajectory level, Hermes logs the full conversation transcript, model identifier, timestamp, completion status, tool call counts, and toolsets used. Through Langfuse, you get input/output/cache token counts, cost estimates aligned with Hermes' canonical `agent.usage_pricing`, and per-tool execution data. Through Portkey, you get 40+ metrics including cost analysis, token usage breakdowns, response time percentiles, and error rates by provider.

What is the hermes-labyrinth plugin?

hermes-labyrinth is a community-built, read-only observability plugin for Hermes Agent. It records agent journeys (sessions), crossings (individual steps), and guideposts (detailed inspections) without modifying agent behavior. Install it by cloning the repository into `~/.hermes/plugins/` and access it through the Hermes web dashboard. It also provides a skill atlas, cron monitoring, model transition tracking, and exportable reports with automatic secret redaction.

How do I export Hermes Agent logs for analysis?

Trajectory files are standard JSONL and can be ingested by any log analysis tool. Successful runs go to `trajectory_samples.jsonl` and failures go to `failed_trajectories.jsonl`. For structured exports, hermes-labyrinth provides JSON and Markdown report endpoints at `/api/plugins/hermes-labyrinth/reports/{journey_id}`. Langfuse and Portkey both have their own export and API access for historical trace data.

Related Resources

Fastio features

Persist Hermes Agent observability artifacts across sessions

Free 50GB workspace with auto-indexing. Upload trajectory files and log exports, then search them by meaning. No credit card, MCP-ready endpoint for agent uploads.