How to Set Up an MCP Server for Hugging Face
An MCP server for Hugging Face lets AI agents interact with Hugging Face Hub through the Model Context Protocol, enabling model discovery, dataset access, inference API calls, and Space management from any MCP-compatible client. This guide covers setting up the official Hugging Face MCP server, configuring it for different clients, extending it with community Spaces, and connecting it to persistent storage for production agent workflows.
What a Hugging Face MCP Server Does
Hugging Face Hub hosts over 2 million models, 500,000 datasets, and 1 million Spaces. That is a massive surface for AI agents to work with, but without a structured interface, agents need custom HTTP code for every API call, plus token management, pagination handling, and error recovery.
The Model Context Protocol solves this by giving agents a standard way to call external tools. The official Hugging Face MCP server wraps the Hub API into discrete, callable tools: search for models by task or library, find datasets by topic or author, run Gradio apps hosted on Spaces, and query the Hugging Face documentation with natural language.
The practical result: instead of writing API integration code, you point your MCP client at https://huggingface.co/mcp, authenticate, and your agent gets access to the entire Hub through named tool calls. Ask it to "find the best text-to-image model under 2B parameters" and it calls the model search tool, filters results, and returns structured metadata with download counts and links.
This matters for agent workflows because Hugging Face is where most open-source ML happens. If your agent needs to select a model for a task, check what datasets are available for fine-tuning, or run inference through a hosted Space, an MCP server eliminates the integration overhead.
Built-in Tools and Capabilities
The Hugging Face MCP server ships with seven built-in tools, each of which you can enable or disable from your MCP settings.
Model Search finds ML models with filters for task type, library, and other metadata. Your agent can search for "PyTorch image classification models sorted by downloads" and get structured results with model IDs, download counts, and direct links.
Dataset Search works the same way for datasets, with filters for author, tags, size, and language. Useful when agents need to identify training data for a specific domain.
Hub Repository Details returns full metadata for any model, dataset, or Space. Enable the "Include repository README files" option to let your agent read model cards and documentation, which is critical for understanding a model's capabilities and limitations before using it.
Spaces Semantic Search finds the best AI apps via natural language queries. Since Hugging Face hosts over a million Spaces, many of them MCP-compatible Gradio apps, this tool lets agents discover and call community-built ML tools dynamically.
Papers Semantic Search finds ML research papers by topic. Agents working on research workflows can search for recent papers on a technique, extract key findings, and summarize the state of the art.
Documentation Semantic Search queries Hugging Face's documentation using natural language. If your agent needs to figure out how to use LoRA adapters with PEFT or configure a Trainer, it can search the docs directly instead of guessing.
Run and Manage Jobs lets agents run, monitor, and schedule compute jobs on Hugging Face infrastructure. This is useful for fine-tuning workflows where an agent needs to kick off a training job and check back when it finishes.
Step-by-Step: Connect the Hugging Face MCP Server
The setup process differs slightly by client, but the core pattern is the same: point your client at the Hugging Face MCP endpoint and authenticate with your token.
Get Your Hugging Face Token
Go to huggingface.co/settings/tokens and create a token with READ permissions. You will need this for authentication regardless of which client you use. If you plan to run or manage Spaces, create a token with WRITE permissions instead.
Claude Code
The fastest setup for Claude Code is a single CLI command:
claude mcp add hf-mcp-server \
-t http https://huggingface.co/mcp \
-H "Authorization: Bearer YOUR_HF_TOKEN"
Replace YOUR_HF_TOKEN with your actual token. This registers the Hugging Face server as an HTTP transport MCP server in your Claude Code configuration.
For browser-based OAuth login (no token needed in the command), append ?login to the URL:
claude mcp add hf-mcp-server -t http https://huggingface.co/mcp?login
VS Code and Cursor
Add a huggingface entry to your MCP configuration file. In VS Code, open .vscode/mcp.json or your user settings. In Cursor, open the MCP settings panel.
{
"huggingface": {
"url": "https://huggingface.co/mcp",
"headers": {
"Authorization": "Bearer YOUR_HF_TOKEN"
}
}
}
Save and restart your editor. The Hugging Face server should appear in your MCP server list.
Claude Desktop
Claude Desktop supports adding the Hugging
Face MCP server directly from its connector gallery. Navigate to Settings, then Connectors, and select "Hugging Face" from the available options. Alternatively, you can click the installation link provided at huggingface.co/settings/mcp after selecting "Claude Desktop" as your client.
Gemini CLI
gemini mcp add -t http huggingface https://huggingface.co/mcp?login
Then run /mcp auth huggingface to complete the OAuth flow.
Verify the Connection
After setup, test with a simple prompt like "Search Hugging Face for text generation models sorted by downloads." Your agent should call the Model Search tool and return a list of models with metadata. If you get a connection error, confirm your token has READ permissions and that the server URL is correct.
Give Your Agents Persistent Storage for ML Workflows
Fast.io's free agent tier includes 50 GB storage, 5,000 credits per month, and a built-in MCP server. Connect your Hugging Face agent workflows to shared workspaces where teams can search, query, and review agent output. No credit card required.
Extend with Community Spaces
The built-in tools cover Hub search and navigation, but the real power of the Hugging Face MCP server is its ability to use community-built Gradio Spaces as additional tools.
Thousands of Spaces on Hugging Face are MCP-compatible Gradio apps. Each one exposes its functions as callable MCP tools with typed arguments and descriptions. An image generation Space becomes a "generate image" tool. A speech-to-text Space becomes a "transcribe audio" tool. Your agent can call these directly, just like the built-in search tools.
Adding Spaces to Your Setup
- Browse MCP-compatible Spaces at huggingface.co/spaces?filter=mcp-server
- Open your MCP settings
- Add the Space you want to use
- Restart your MCP client to pick up the new tools
Dynamic Spaces (Experimental)
The settings page includes a "Dynamic Spaces" option that lets your agent discover and use MCP-compatible Spaces at runtime without adding them manually. When enabled, your agent can find a Space that matches its needs and call it in the same turn. This is useful for exploratory workflows where the agent does not know ahead of time which tools it will need.
Practical Examples
With the right
Spaces enabled, your agent can handle multi-step ML workflows:
- Search for a text-to-image model, then call a generation Space to create an image and evaluate the output
- Find a dataset for sentiment analysis, then run a classification Space to test sample predictions
- Discover a speech-to-text Space, transcribe an audio file, and summarize the transcript using a language model
Each of these workflows chains built-in search tools with community Space tools, all through the same MCP interface.
Running the Server Locally
The hosted endpoint at huggingface.co/mcp works for most setups, but you can also run the Hugging Face MCP server locally. This is useful for air-gapped environments, custom tool configurations, or when you need to modify the server's behavior.
NPX (Quickest)
npx @llmindset/hf-mcp-server
This starts the server in STDIO mode, which works with any MCP client that supports local server processes. For HTTP mode:
npx @llmindset/hf-mcp-server-http
The HTTP server runs on port 3000 at the /mcp endpoint.
Docker
docker pull ghcr.io/evalstate/hf-mcp-server:latest
docker run --rm -p 3000:3000 ghcr.io/evalstate/hf-mcp-server:latest
Environment Variables
When running locally, configure the server with these environment variables:
TRANSPORT: Set tostdio,streamableHttp, orstreamableHttpJsondepending on your client's requirementsHF_API_TIMEOUT: API request timeout in milliseconds (default: 12,500ms). Increase this if you are on a slow connection or calling compute-heavy SpacesMCP_PING_INTERVAL: Keep-alive frequency in milliseconds (default: 30,000ms)
Set your Hugging Face token as the HF_TOKEN environment variable. The local server reads it automatically.
Transport Modes
The server supports three transport protocols:
- STDIO: Local connections only. The client spawns the server as a subprocess. Simplest setup, no network configuration needed.
- Streamable HTTP: Remote connections over HTTP. This is what the hosted endpoint uses and is the recommended transport for production deployments.
- SSE (deprecated): Server-Sent Events transport. Still supported in the open-source code but deprecated since March 2025. Use Streamable HTTP for new setups.
Connecting Agent Output to Persistent Storage
The Hugging Face MCP server handles model discovery, dataset search, and inference. But agents that use these tools in production workflows need somewhere to store the results: generated images, downloaded datasets, model evaluation reports, and pipeline artifacts.
Local filesystems work for one-off experiments. But when agents run as background processes, operate across sessions, or hand off work to human reviewers, local storage breaks down. You need persistent, shared storage that agents can write to and humans can access without SSH-ing into a server.
Fast.io provides this as an intelligent workspace. Agents connect through the Fast.io MCP server and get 19 tools for file operations, search, and collaboration. Upload a model evaluation report, share a generated dataset with a teammate, or build a curated model catalog that persists across agent sessions.
The combination works well for ML workflows:
- Agent searches Hugging Face for models matching a task requirement
- Agent runs inference through a Space and saves the results
- Agent uploads outputs to a Fast.io workspace with Intelligence Mode enabled
- Team members search and query the results through Fast.io's built-in RAG, no separate vector database needed
- When the work is ready, transfer workspace ownership from the agent to the human project lead
The free agent tier includes 50 GB of storage, 5,000 credits per month, and 5 workspaces with no credit card required and no expiration. For teams running multiple agents across different ML projects, each agent can operate in its own workspace with granular permissions and full audit trails.
You can also use Fast.io's URL Import to pull files from Google Drive, OneDrive, Box, or Dropbox into an agent workspace, useful when your training data lives in a different cloud service and your agent needs to reference it alongside Hugging Face resources.
Frequently Asked Questions
Can AI agents use Hugging Face models?
Yes. The Hugging Face MCP server exposes model search, dataset search, and Space execution as callable tools through the Model Context Protocol. Any MCP-compatible agent (Claude, GPT-4, Gemini, or agents built with LangChain, CrewAI, and similar frameworks) can search for models, read model cards, and run inference through hosted Spaces.
How do I connect Claude to Hugging Face?
In Claude Code, run: claude mcp add hf-mcp-server -t http https://huggingface.co/mcp -H "Authorization: Bearer YOUR_HF_TOKEN". In Claude Desktop, go to Settings, then Connectors, and add Hugging Face from the gallery. Both methods give Claude access to model search, dataset search, Spaces, and documentation tools.
Is there an official MCP server for Hugging Face?
Yes. Hugging Face maintains an official MCP server at huggingface.co/mcp. The source code is open source on GitHub at huggingface/hf-mcp-server. It supports STDIO and Streamable HTTP transports and includes seven built-in tools plus the ability to add community Gradio Spaces as additional tools.
What can agents do with Hugging Face Hub?
Through the MCP server, agents can search models by task and library, search datasets by topic and tags, read model cards and documentation, run inference through MCP-compatible Gradio Spaces, search ML research papers, query Hugging Face documentation, and run compute jobs on Hugging Face infrastructure.
Do I need a paid Hugging Face account to use the MCP server?
No. A free Hugging Face account with a READ-permission token is sufficient for model search, dataset search, and documentation queries. Some Spaces and compute features may require a Pro account or ZeroGPU quota, but the core MCP tools work on free accounts.
Can I use the Hugging Face MCP server with non-Claude agents?
Yes. The server uses the open Model Context Protocol standard and works with any MCP-compatible client, including VS Code, Cursor, Zed, Gemini CLI, ChatGPT, and custom agents built with frameworks like LangChain or the Hugging Face Hub Python library's built-in MCPClient.
Related Resources
Give Your Agents Persistent Storage for ML Workflows
Fast.io's free agent tier includes 50 GB storage, 5,000 credits per month, and a built-in MCP server. Connect your Hugging Face agent workflows to shared workspaces where teams can search, query, and review agent output. No credit card required.