How to Run Hermes Agent as a Discord Bot With Voice Channel Support
Hermes Agent connects to Discord as a full messaging gateway, handling text conversations, slash commands, file attachments, and live voice channels from a single background process. This guide walks through every step from creating your Discord application to joining a voice channel and persisting files your agent generates across sessions.
What the Hermes Agent Discord Gateway Does
Nous Research Hermes Agent is an open-source (MIT-licensed) autonomous AI agent with persistent memory, installable skills, scheduled automations, and connections to over 20 messaging platforms. Discord is one of the best-supported gateways, with text messaging, slash commands, file attachments, threaded conversations, and live voice channels all working natively.
The messaging gateway runs as a single background process that connects to every platform you configure. It handles session isolation, cron jobs, and voice delivery in one daemon. When a user sends a message in a Discord channel or DM, the gateway routes it through a per-chat session store and dispatches it to the agent core for processing. That means your agent's reasoning, tool use, and memory work identically whether someone messages on Discord, Telegram, Slack, or any other connected platform.
What sets the Discord integration apart from typical chatbot tutorials is scope. Most guides walk you through a simple bot that responds with static text or basic API calls. Hermes Agent brings persistent memory across conversations, tool calling for real tasks, skills that extend its capabilities, and voice channel support where the bot joins a channel, listens to users speaking, transcribes their speech, processes it through the full agent pipeline, and speaks the response back. The gap this guide fills is getting all of that running on your Discord server, then solving the file persistence problem that most setups ignore.
Step 1: Create the Discord Bot Application
Start at the Discord Developer Portal. Sign in with your Discord account, then click New Application in the top-right corner. Give it a name like "Hermes Agent" and accept the Developer Terms of Service.
Get the Bot Token Navigate to the Bot section in the left sidebar. Click Reset Token and copy the token immediately. Discord only displays it once. This token is the credential Hermes uses to authenticate as your bot, so treat it like a password. Never commit it to Git or share it in a public channel.
Enable Privileged Gateway Intents Still in the Bot section, scroll to Privileged Gateway Intents and enable these two toggles:
- Server Members Intent: required for resolving usernames in server channels
- Message Content Intent: required for the bot to read message text
Without Message Content Intent, your bot receives message events but the content arrives empty. This is the most common reason a new Hermes Discord bot appears online but never responds. If your bot is in 100 or more servers, Discord requires a verification application for privileged intents.
Generate the Invite URL
Go to the Installation tab and enable Guild Install. Set the scopes to bot and applications.commands, then apply these permissions:
Required permissions:
- View Channels
- Send Messages
- Embed Links
- Attach Files
- Read Message History
Recommended additions:
- Send Messages in Threads
- Add Reactions
For voice channel support, you also need:
- Connect
- Speak
- Use Voice Activity Copy the generated invite URL, open it in your browser, select your server, and authorize the bot.
Find Your Discord User ID
Hermes denies all users by default as a security measure. To allow yourself access, you need your numeric Discord User ID (not your display name). In Discord, go to Settings, then Advanced, and enable Developer Mode. Right-click your own username anywhere and select Copy User ID.
Step 2: Configure Hermes Agent for Discord
With the bot created and invited, configure Hermes to connect to it.
Install with Messaging Support
If you haven't installed Hermes yet, use the messaging extra to pull in Discord dependencies:
pip install "hermes-agent[messaging]"
This includes discord.py with voice support built in. For the full package with CLI voice, premium TTS, and all messaging platforms:
pip install "hermes-agent[all]"
System Dependencies for Voice
Voice channels require the Opus audio codec and FFmpeg. Install them for your platform:
macOS:
brew install portaudio ffmpeg opus espeak-ng
Ubuntu/Debian:
sudo apt install portaudio19-dev ffmpeg libopus0 espeak-ng
Run the Interactive Setup
The fastest path is the guided wizard:
hermes gateway setup
Select Discord when prompted, paste your bot token, and enter your User ID. The wizard writes the configuration to ~/.hermes/.env.
For manual configuration, add these lines to ~/.hermes/.env directly:
DISCORD_BOT_TOKEN=your-bot-token-here
DISCORD_ALLOWED_USERS=284102345871466496
Replace the User ID with your own. For multiple users, separate IDs with commas.
Start the Gateway
hermes gateway
The gateway connects to Discord and all other configured platforms. You should see your bot come online in Discord within a few seconds. Send it a DM or mention it in a server channel to confirm it responds.
Step 3: Set Up Voice Channel Support
Voice is where the Hermes Discord integration goes beyond text chatbots. The bot joins a Discord voice channel, listens to each user independently, transcribes their speech, processes it through the full agent pipeline (including tool use and memory), and speaks the response back.
Configure Speech-to-Text
Hermes supports three STT providers for transcribing voice input:
Local Whisper (free, no API key):
Install faster-whisper locally. The base model downloads automatically at around 150 MB. This runs entirely on your machine with no external API calls.
Groq Whisper (free tier available):
Add GROQ_API_KEY to ~/.hermes/.env. Uses whisper-large-v3-turbo with roughly 0.5-second latency.
OpenAI Whisper (paid):
Add VOICE_TOOLS_OPENAI_KEY to ~/.hermes/.env. Supports whisper-1 and gpt-4o-transcribe models.
For a Discord bot that stays responsive in voice channels, Groq or local Whisper work well. OpenAI Whisper adds 1-2 seconds of latency per transcription, which is noticeable in real-time conversation.
Configure Text-to-Speech
For the bot's spoken responses, choose a TTS provider:
Edge TTS is the default and works without any API key. For higher-quality voice output, ElevenLabs produces the most natural-sounding speech. Add ELEVENLABS_API_KEY to your .env file to enable it.
Join a Voice Channel
Once the gateway is running with voice dependencies installed, use slash commands in any Discord text channel:
/voice join: Bot joins your current voice channel/voice leave: Bot disconnects from the voice channel/voice status: Shows current voice mode and connected channel/voice tts: Enables spoken audio responses for all messages (not just voice)
When active in a voice channel, the bot detects speech from each user, waits for a 1.5-second silence after at least 0.5 seconds of speech, then transcribes and processes the input. Transcripts appear in the text channel as [Voice] @user: what you said, so you have a written record of the conversation.
The bot includes echo prevention that automatically pauses audio listening while playing its own TTS responses. A hallucination filter catches phantom transcriptions from background noise, removing known false-positive phrases across multiple languages.
Voice Access Control
Only users listed in DISCORD_ALLOWED_USERS can interact through voice. Audio from unauthorized users is silently ignored. This is the same access control that governs text interactions, so no separate voice permission list is needed.
Give Your Discord Agent Persistent File Storage
Fast.io's free agent plan includes 50 GB of storage and MCP server access, so your Hermes bot can save, search, and share files beyond the Discord channel. No credit card required.
Tuning Channel Behavior and Access Control
The Discord gateway exposes detailed controls for how the bot behaves across different channel types.
Channel Response Modes
By default, the bot responds to every DM without requiring a mention, but needs an @mention in server channels. You can change this per channel:
Free-response channels let the bot reply without mentions. Set DISCORD_FREE_RESPONSE_CHANNELS in your .env file with a comma-separated list of channel IDs. This works well for a dedicated "talk to the agent" channel.
Ignored channels prevent the bot from responding at all. Set DISCORD_IGNORED_CHANNELS for channels where the bot should stay silent.
No-thread channels make the bot reply inline instead of creating a new thread. By default, each @mention in a text channel creates a thread for the conversation. Set DISCORD_NO_THREAD_CHANNELS for channels where inline replies work better.
Session Isolation Hermes isolates sessions per user by default. If Alice and Bob both talk to the bot in the same channel, they get separate conversation contexts. Neither sees the other's history, and one user's long conversation does not consume context window space for the other.
For shared sessions where everyone in a channel contributes to the same conversation, set group_sessions_per_user: false in config.yaml. This is useful for team brainstorming channels where the agent should have full context of everything everyone says.
Role-Based Access
Instead of listing individual User IDs, you can authorize Discord roles. Add role IDs to DISCORD_ALLOWED_ROLES:
DISCORD_ALLOWED_ROLES=987654321098765432,876543210987654321
Users are authorized if their ID is in the allowed users list OR they hold an allowed role. This makes it practical to manage access for larger teams without updating the .env file every time someone joins.
Slash Command Restrictions
You can limit which slash commands regular users can run while keeping full access for admins. In config.yaml, use allow_admin_from to designate admin User IDs and user_allowed_commands to whitelist specific commands for everyone else. The /help and /whoami commands are always available.
Skills installed on your Hermes Agent automatically register as native Discord slash commands, appearing in Discord's autocomplete menu. The limit is 100 application commands per bot. If you run multiple gateway instances against the same Discord application, set slash_commands: false on non-primary instances to prevent registration conflicts.
Persisting Agent Files With Fast.io Workspaces
The problem with most Discord bot deployments is file persistence. Hermes can generate files, send them as Discord attachments, and receive uploads from users. But those files live on the host machine's filesystem. If the container restarts, the disk fills, or you need to share agent output with someone who is not in Discord, the files are stuck.
Local storage works for experimentation. For production Discord bots, consider your options:
Local filesystem: Zero setup, but files are lost on container restarts. No collaboration features. Fine for a personal bot on a dedicated machine.
S3 or similar object storage: Durable, but requires IAM configuration, bucket policies, and custom integration code. No built-in search or collaboration.
Fast.io: Persistent workspace storage with an MCP server that agents connect to directly. Files uploaded to a workspace are automatically indexed for semantic search when Intelligence is enabled, so your agent's output becomes queryable without building a separate retrieval pipeline.
Fast.io's free agent plan includes 50 GB of storage, 5,000 credits per month, and 5 workspaces with no credit card required. The MCP server exposes Streamable HTTP at /mcp and legacy SSE at /sse, giving your agent programmatic access to upload files, create shares, and query workspace contents.
A practical pattern for Discord bots: your Hermes Agent processes requests in Discord, generates output files (reports, images, processed data), uploads them to a Fast.io workspace through the MCP server, and shares a link back in the Discord channel. The workspace becomes the durable layer where files are versioned, searchable, and accessible to humans through the web interface, even team members who never touch Discord.
Ownership transfer is another useful feature for Discord bots that serve clients. An agent can build a workspace, populate it with deliverables, then transfer the organization to a human recipient. The agent retains admin access for ongoing updates while the human owns the workspace. This is how you move from "agent generated some files in a Discord channel" to "client has a branded workspace with everything organized."
For team scenarios, file locks prevent conflicts when multiple agents or users modify the same files. Webhooks let you build reactive workflows, your Discord bot can receive a notification when a file changes in the workspace and respond in the relevant channel without polling.
Deployment and Troubleshooting
Hermes supports several deployment backends. The gateway runs on whichever backend you choose, and the Discord bot connects through it.
Local: Commands execute directly on your machine. Simplest setup for personal use.
Docker: Hermes reuses a single long-lived container, auto-mounts the skills directory, and persists installed dependencies between tool calls. Good for isolating the agent's environment from your host system.
SSH: Run the agent on a remote server. Hermes syncs modified files back to the host on session teardown.
Modal: Serverless execution that hibernates when idle and wakes on demand. Useful for bots that handle intermittent traffic without running a server 24/7.
Singularity/Apptainer: For HPC clusters where Docker is not available.
For a Discord bot that should stay online continuously, Docker or a dedicated server (local or SSH) are the most reliable options. Modal works well for bots that are active during specific hours.
Common Issues and Fixes
Bot is online but doesn't respond to messages: The most frequent cause is Message Content Intent being disabled. Go to the Discord Developer Portal, then Bot, then Privileged Gateway Intents, and confirm Message Content Intent is toggled on. Save and restart the gateway.
"Disallowed Intents" error on startup: Enable all three Privileged Gateway Intents in the Bot settings: Presence Intent, Server Members Intent, and Message Content Intent.
Bot can't see a specific channel: Check that the bot's role has View Channel and Read Message History permissions for that channel. Discord's permission system is hierarchical, so a channel override can block access even when the role has server-wide permissions.
"User not allowed" responses:
Your Discord User ID is missing from DISCORD_ALLOWED_USERS or you don't hold a role listed in DISCORD_ALLOWED_ROLES. Verify with /whoami if the bot responds to that command, or check your .env file and restart the gateway.
Voice channel: bot joins but doesn't speak:
Confirm that libopus is installed (libopus.dylib on macOS, libopus.so.0 on Linux) and that FFmpeg is available on the system path. Without the Opus codec, Discord voice communication fails silently.
Unexpected context sharing between users:
The default setting group_sessions_per_user: true isolates sessions. If users are seeing each other's conversations, check that this setting hasn't been changed in config.yaml.
Frequently Asked Questions
Can Hermes Agent run as a Discord bot?
Yes. Discord is one of over 20 supported messaging gateways in Hermes Agent. The bot handles text messages, slash commands, file attachments, threaded conversations, and voice channels through a single background gateway process.
How do I set up Hermes Agent on Discord?
Create a bot application in the Discord Developer Portal, enable Message Content Intent and Server Members Intent, copy the bot token, then run "hermes gateway setup" and select Discord. Paste your token and User ID, start the gateway, and the bot comes online.
Does Hermes Agent support Discord voice channels?
Yes. The bot can join a voice channel, listen to users speaking, transcribe speech using Whisper (local, Groq, or OpenAI), process the input through the full agent pipeline, and speak the response back using configurable TTS providers including Edge TTS, ElevenLabs, and OpenAI TTS.
Can I use Hermes Agent in my Discord server?
Yes. Invite the bot using the OAuth2 URL from the Developer Portal with bot and applications.commands scopes. Set DISCORD_ALLOWED_USERS or DISCORD_ALLOWED_ROLES to control who can interact with it. The bot works in DMs, server channels, threads, and voice channels.
How does Hermes Agent handle multiple users in one Discord channel?
By default, sessions are isolated per user. Alice and Bob in the same channel get separate conversation contexts with independent memory and history. You can switch to shared sessions in config.yaml if your use case requires a collaborative conversation where the agent sees everything from all participants.
What TTS and STT providers does Hermes Agent support for Discord voice?
For speech-to-text, Hermes supports local faster-whisper (free, no API key), Groq Whisper (free tier), and OpenAI Whisper (paid). For text-to-speech, options include Edge TTS (free), ElevenLabs (paid, highest quality), OpenAI TTS (paid), and NeuTTS (free, CPU-dependent).
Related Resources
Give Your Discord Agent Persistent File Storage
Fast.io's free agent plan includes 50 GB of storage and MCP server access, so your Hermes bot can save, search, and share files beyond the Discord channel. No credit card required.