How to Monitor Hermes Agent: Observability and Tracing Setup
Hermes Agent ships with a layered observability stack that most agent frameworks lack out of the box. This guide walks through all four layers: built-in JSONL trajectory logging, Langfuse tracing for per-turn LLM visibility, Portkey for cost tracking and provider management, and the hermes-labyrinth plugin for read-only journey inspection. You will finish with a working setup that captures every tool call, model switch, and token spend.
What Are the Four Layers of Hermes Agent Observability
Most AI agent observability guides describe a single integration, usually OpenTelemetry or a vendor SDK. Hermes Agent from Nous Research takes a different approach by stacking four complementary layers, each solving a distinct problem.
- Built-in JSONL trajectory logging records every completed conversation as an append-only file. This is your post-incident replay source and training data pipeline.
- Langfuse tracing captures per-turn spans, per-call generations, and per-tool observations with cost and token breakdowns. This is your debugging microscope.
- Portkey gateway monitoring tracks 40+ metrics including cost, token usage, and response time across 200+ LLM providers. This is your budget guardrail and provider failover layer.
- hermes-labyrinth plugin provides a read-only dashboard for journey inspection, skill diagnostics, and exportable reports. This is your operator control plane.
Each layer can run independently. You can start with just trajectory logging (it is on by default when configured) and add Langfuse and Portkey as your deployment matures. The rest of this guide covers setup and configuration for each layer, starting with what ships built in.
Built-in Trajectory Logging and the RedactingFormatter
Hermes Agent's most underrated observability feature is its JSONL trajectory system. Every completed conversation gets saved as a single JSON object containing the full turn-by-turn transcript, timestamp, model identifier, and completion status.
The system routes trajectories to separate files based on outcome:
trajectory_samples.jsonlcaptures conversations that completed successfullyfailed_trajectories.jsonlcaptures conversations that failed or were interrupted
This separation matters for two reasons. First, you can analyze failure patterns without filtering through thousands of successful runs. Second, successful trajectories double as training data in ShareGPT-compatible format, ready for fine-tuning workflows.
Each trajectory entry follows a consistent structure:
{
"conversations": [
{"from": "system", "value": "..."},
{"from": "human", "value": "..."},
{"from": "gpt", "value": "..."}
],
"timestamp": "2026-05-17T14:30:00Z",
"model": "claude-sonnet-4-6",
"completed": true
}
Tool calls get converted to XML-wrapped JSON with parsed arguments, and tool responses are grouped into single turns with results joined by newlines. Reasoning tokens from any provider get normalized into <think> tags, regardless of whether the model uses native thinking tokens or system-prompted XML.
Enable trajectory saving through YAML config or CLI flag:
### In ~/.hermes/config.yaml
agent:
save_trajectories: true
### Or via CLI
hermes --save-trajectories
Security at the log layer. Hermes uses a RedactingFormatter that strips API keys and tokens before any log entry hits disk. The dual-file logging system writes INFO+ events to agent.log and WARNING+ events to errors.log, both using RotatingFileHandler with the redaction formatter applied. You never need to worry about credentials leaking into trajectory files or log archives.
Langfuse Tracing for Per-Turn Visibility
Trajectory files show you what happened. Langfuse shows you why it happened and how long each step took.
The observability/langfuse plugin creates structured traces with three levels of granularity: one span per agent turn, one generation per LLM API call, and one tool observation per tool invocation. This hierarchy lets you drill from a high-level session view down to individual API latencies and token counts.
Installation and Configuration
Install the Langfuse SDK and enable the plugin:
pip install langfuse
hermes plugins enable observability/langfuse
Add your credentials to ~/.hermes/.env:
HERMES_LANGFUSE_PUBLIC_KEY=pk-lf-...
HERMES_LANGFUSE_SECRET_KEY=sk-lf-...
HERMES_LANGFUSE_BASE_URL=https://cloud.langfuse.com
Hermes accepts both HERMES_LANGFUSE_* and standard LANGFUSE_* environment variables, so if you already have Langfuse configured for another project, the plugin picks up your existing credentials.
How the Plugin Traces Execution
The plugin hooks into four points in the agent loop:
pre_api_requestopens a root span labeled "Hermes turn"post_api_requestcloses the generation and attaches usage metrics and cost detailspre_tool_callstarts a tool observation with sanitized argumentspost_tool_callcloses the observation with result data
Usage metrics align with Hermes' canonical agent.usage_pricing numbers, so the Langfuse dashboard shows the same input/output/cache token breakdown that appears in hermes logs. Sessions are grouped using the Hermes session ID through langfuse.propagate_attributes, which means multi-turn conversations appear as a single trace tree.
Tuning and Sampling
For high-volume deployments, you probably don't need to trace every single turn. The plugin supports several tuning variables:
HERMES_LANGFUSE_SAMPLE_RATE: Set to0.1to trace 10% of sessions (default:1.0)HERMES_LANGFUSE_MAX_CHARS: Cap the size of traced tool results (default:12000). Large file reads get summarized instead of logged verbatim.HERMES_LANGFUSE_ENV: Tag traces with an environment likeproductionorstagingHERMES_LANGFUSE_RELEASE: Tag traces with a version for deployment correlation
The plugin is fail-open by design. If the Langfuse SDK is missing, credentials are wrong, or the Langfuse server is down, the plugin silently skips tracing. The agent loop is never blocked or slowed by observability failures.
Verify the integration is active:
hermes plugins list
Look for observability/langfuse in the enabled plugins list.
Persist Hermes Agent observability artifacts across sessions
Free 50GB workspace with auto-indexing. Upload trajectory files and log exports, then search them by meaning. No credit card, MCP-ready endpoint for agent uploads.
Portkey Gateway for Cost Tracking and Provider Management
Langfuse answers "what did the agent do?" Portkey answers "how much did it cost and which provider served the request?"
Portkey sits between Hermes and your LLM providers as an OpenAI-compatible gateway. Every API call passes through Portkey, which logs it, measures it, and applies any budget or rate controls you've configured. This is especially important for Hermes because the agent runs unattended via cron jobs, messaging gateways (Telegram, Discord, Slack, WhatsApp), and scheduled automations where a runaway session could burn through your budget before anyone notices.
Setup
Portkey configuration replaces the LLM provider endpoint in Hermes with the Portkey gateway URL:
hermes config set model.base_url https://api.portkey.ai/v1
hermes config set model.default "@openai-prod/gpt-4o"
The @<provider-slug>/<model-name> format lets you define named provider routes. Add your Portkey API key and virtual key configuration through the Portkey dashboard, then reference them in ~/.hermes/config.yaml.
What Portkey Tracks
Portkey monitors 40+ metrics across every request, including:
- Cost analysis with per-request and cumulative spend breakdowns
- Token usage split by input, output, and cache tokens
- Response time distributions with percentile analysis
- Error rates by provider, model, and status code
- Request logs with complete request/response pairs and custom metadata tags
The dashboard gives you a single view across all providers, which matters when Hermes switches models mid-session using /model commands.
Budget Controls
Set spending limits to prevent runaway costs:
- Monthly cost caps (e.g., $200/month for a production agent)
- Token limits (e.g., 10M tokens/week)
- Rate limits (requests per minute per provider)
When a threshold is reached, Portkey blocks the request rather than letting it through. For agents running overnight or handling automated workflows, this is the difference between waking up to a $50 bill and a $5,000 bill.
Failover and Load Balancing
Portkey also handles reliability concerns that most teams build ad-hoc:
- Fallbacks route to backup providers when the primary returns errors
- Load balancing distributes requests across multiple API keys with weighted distribution
- Caching reduces costs for repeated queries, which is common with scheduled tasks that re-process similar prompts
- Retries handle transient failures automatically based on status codes
Since Portkey supports 200+ LLMs and over 1,600 models, you can switch between Anthropic, Google, Mistral, Azure OpenAI, and others without changing Hermes configuration beyond the provider slug.
The hermes-labyrinth Plugin for Journey Inspection
The three layers above capture data. hermes-labyrinth turns that data into something an operator can browse.
hermes-labyrinth is a community-built, read-only observability plugin that functions as a "black-box recorder for agents moving through unknown work." It doesn't modify agent behavior. It watches, records, and organizes what the agent does into navigable structures.
Installation Clone the plugin into your Hermes plugins directory:
mkdir -p ~/.hermes/plugins
git clone https://github.com/stainlu/hermes-labyrinth.git \
~/.hermes/plugins/hermes-labyrinth
Access it through the Hermes web dashboard at the Labyrinth tab after running hermes dashboard.
Core Concepts
The plugin organizes observability data around four abstractions:
Journeys are indexed records of recent agent activity across CLI, dashboard, gateway, cron, and delegated work. Think of a journey as a single session or task execution from start to finish.
Crossings are the ordered steps within a journey. Each crossing captures prompts, tool calls, tool results, failures, model switches, subagent delegations, approvals, memory operations, redactions, context compressions, and cron runs.
Guideposts are inspector views that show input, output, duration, status, and evidence for any individual crossing. When you need to understand exactly why a tool call failed or how long a specific model call took, the guidepost gives you the drill-down.
Reports are exportable Markdown or JSON summaries of complete journeys, with secret redaction applied automatically.
Additional Features
Beyond journey tracking, hermes-labyrinth provides:
- A skill atlas showing which skills are active, which are shadowed by overrides, and duplicate diagnostics
- A cron gate for monitoring scheduled automations
- A model ferry that tracks model transitions across sessions, useful when Hermes switches providers mid-conversation
API Access
The plugin exposes read-only endpoints for programmatic access:
/api/plugins/hermes-labyrinth/journeyslists recent journeys/api/plugins/hermes-labyrinth/skillsshows skill diagnostics/api/plugins/hermes-labyrinth/crondisplays scheduled job status/api/plugins/hermes-labyrinth/reports/{journey_id}.jsonexports journey data
All endpoints apply the same secret redaction that the UI uses. If the redaction tools are unavailable, the plugin fails securely rather than exposing raw data.
How to Persist Observability Artifacts Across Deployments
Hermes Agent runs on a server, whether local, Docker, SSH, or a cloud provider like Modal. Trajectory files, log archives, and exported reports accumulate on that server's filesystem. This works fine for a single instance, but problems emerge quickly when you run multiple Hermes deployments, need to share incident data with teammates, or want to preserve observability artifacts beyond the server's lifecycle.
Local storage is the default and works for development. For production Hermes deployments, you need the observability data somewhere accessible and durable.
Options for persisting observability artifacts include cloud object storage like S3 or GCS, network-attached volumes if you're running in a container, or a shared workspace platform. Fast.io provides a workspace approach that works well for agent-generated artifacts: upload trajectory files, log exports, and hermes-labyrinth reports to a shared workspace where both the agent and human operators can access them.
The practical setup looks like this:
- Hermes writes trajectory files and log archives to its local filesystem
- A post-session script or cron job uploads artifacts to a persistent workspace
- Human operators browse, search, and download artifacts through the workspace UI
- When Intelligence Mode is enabled on the workspace, uploaded files are automatically indexed for semantic search, so you can ask questions like "show me all failed trajectories from last Tuesday that involved the GitHub tool"
Fast.io's free agent tier includes 50GB of storage, 5,000 AI credits per month, and 5 workspaces with no credit card required. The MCP server at /mcp gives agents programmatic access to upload, organize, and query workspace contents. For teams running multiple Hermes instances across different tasks, a shared workspace becomes the central observability archive that persists regardless of which server instances are running.
Other viable options include syncing trajectory files to an S3 bucket with lifecycle policies, mounting a shared NFS volume across instances, or building a custom pipeline that ships logs to your existing ELK or Grafana stack. The right choice depends on your existing infrastructure and how many people need access to the data.
How to Roll Out Each Observability Layer Step by Step
Here's the practical order for setting up Hermes Agent observability from scratch.
Start with trajectory logging. Enable save_trajectories: true in your config. This costs nothing, runs locally, and gives you the raw conversation data you need for debugging and training. The RedactingFormatter handles secret stripping automatically.
Add Langfuse when you need debugging depth. Once you start running multi-turn sessions or complex tool chains, trajectory files alone won't tell you where time is being spent. Langfuse's per-turn spans and per-call generations give you the waterfall view you need to identify bottlenecks. Start with a HERMES_LANGFUSE_SAMPLE_RATE of 1.0 and lower it once you have enough baseline data.
Add Portkey when costs matter. If you're running Hermes on a schedule, through messaging gateways, or across multiple providers, Portkey's budget controls and failover routing prevent the kind of cost surprises that come with unattended agent operations. The 40+ metrics dashboard gives you the provider-level visibility that Langfuse doesn't cover.
Install hermes-labyrinth when you need operator tooling. The plugin is most valuable when non-developers need to inspect agent behavior, review journey paths, or export incident reports. Its read-only design means there's zero risk of the observability layer affecting agent behavior.
For ongoing telemetry improvements, keep an eye on GitHub issue #6741, which proposes structured session tracing with start/end timestamps and parent-child span relationships. This would add timing instrumentation for the prompt build, model call, tool dispatch, and response phases, enabling performance waterfall analysis that currently requires combining data from Langfuse and trajectory files manually.
Frequently Asked Questions
How do I monitor Hermes Agent performance?
Enable Langfuse tracing with `hermes plugins enable observability/langfuse` and add your Langfuse credentials to `~/.hermes/.env`. The plugin creates per-turn spans and per-call generations that show latency, token usage, and cost for every step. For provider-level performance data across multiple LLMs, route requests through the Portkey gateway, which tracks 40+ metrics including response time distributions.
Does Hermes Agent support Langfuse?
Yes. Hermes ships a built-in `observability/langfuse` plugin that traces every agent turn, LLM call, and tool invocation. Install it with `pip install langfuse` and `hermes plugins enable observability/langfuse`. The plugin uses four hooks (pre/post API request and pre/post tool call) to create structured traces. It's fail-open, so if Langfuse is unreachable, the agent continues without interruption.
How do I track Hermes Agent costs across providers?
Configure Portkey as a gateway by setting `model.base_url` to `https://api.portkey.ai/v1` in your Hermes config. Portkey logs every request with cost, token, and timing data across 200+ LLM providers. You can set monthly cost caps, token limits, and rate limits. When Hermes switches providers mid-session with `/model` commands, Portkey tracks spending across all of them in a single dashboard.
What metrics does Hermes Agent log?
At the trajectory level, Hermes logs the full conversation transcript, model identifier, timestamp, completion status, tool call counts, and toolsets used. Through Langfuse, you get input/output/cache token counts, cost estimates aligned with Hermes' canonical `agent.usage_pricing`, and per-tool execution data. Through Portkey, you get 40+ metrics including cost analysis, token usage breakdowns, response time percentiles, and error rates by provider.
What is the hermes-labyrinth plugin?
hermes-labyrinth is a community-built, read-only observability plugin for Hermes Agent. It records agent journeys (sessions), crossings (individual steps), and guideposts (detailed inspections) without modifying agent behavior. Install it by cloning the repository into `~/.hermes/plugins/` and access it through the Hermes web dashboard. It also provides a skill atlas, cron monitoring, model transition tracking, and exportable reports with automatic secret redaction.
How do I export Hermes Agent logs for analysis?
Trajectory files are standard JSONL and can be ingested by any log analysis tool. Successful runs go to `trajectory_samples.jsonl` and failures go to `failed_trajectories.jsonl`. For structured exports, hermes-labyrinth provides JSON and Markdown report endpoints at `/api/plugins/hermes-labyrinth/reports/{journey_id}`. Langfuse and Portkey both have their own export and API access for historical trace data.
Related Resources
Persist Hermes Agent observability artifacts across sessions
Free 50GB workspace with auto-indexing. Upload trajectory files and log exports, then search them by meaning. No credit card, MCP-ready endpoint for agent uploads.