How to Manage Files in Rasa Chatbots
Rasa file management includes storing conversation data, training files, model artifacts, and user documents. Rasa handles training data well, but real-world assistants need a plan for runtime files. You need to handle user uploads, generate reports, and secure your model versions. This guide covers rasa chatbot file management with practical examples.
What is Rasa Chatbot File Management?
Rasa file management covers two main areas: organizing your project's training data and handling files while the bot runs. The first ensures your bot learns correctly and your team works well together. The second controls how your assistant handles user documents and outputs.
Static File Management involves organizing nlu.yml, stories.yml, domain.yml, and config.yml. As projects grow from prototypes to full assistants, a clean structure helps your team work faster.
Dynamic File Management handles the runtime data. This includes user-uploaded documents (like PDF invoices, images for claims processing, or ID verification), generated reports, and conversation state.
Rasa documentation focuses heavily on the static side (training data). But as chatbots do more, processing, validating, and storing user files securely becomes a bigger challenge.
What to check before scaling rasa chatbot file management
Organizing your Rasa project structure helps keep your assistant maintainable. As your bot grows, a single data/nlu.yml file becomes hard to manage, causing merge conflicts and lost training examples.
Modularize Your Data
Split your training data into smaller, intent-specific files. Instead of one big NLU file, use a folder structure. Rasa reads all files in the data/ directory.
Recommended Directory Structure:
project/
├── actions/
│ ├── __init__.py
│ └── actions.py
├── data/
│ ├── core/
│ │ ├── stories_onboarding.yml
│ │ └── stories_support.yml
│ ├── nlu/
│ │ ├── chitchat.yml
│ │ ├── faq.yml
│ │ └── business_logic.yml
│ └── rules.yml
├── models/
├── config.yml
├── domain.yml
└── endpoints.yml
Version Control Everything
Treat your data like code. Use Git to track every change in stories.yml and rules.yml. This lets you roll back to previous model versions if a new training set causes errors. For larger teams, consider tools that commit changes back to your Git repository, keeping everything in one place.
Externalize Responses
While you can keep responses in domain.yml, moving them to a dedicated responses.yml (which you can import into your domain configuration) helps non-technical team members update the bot's tone without risking changes to the core logic.
Handling User File Uploads in Rasa
Rasa Open Source doesn't handle file uploads out of the box. When a user sends a file via Slack, Telegram, or a web widget, the input channel usually sends a payload with a file URL or ID, rather than the file itself.
To process these files, you need Custom Actions.
- Receive the Event: Your custom action (
actions.py) must extract the file link from the tracker payload. The location of this link varies by channel (e.g.,tracker.get_latest_input_channel()). - Download and Validate: The action downloads the file to a temporary location (or directly to memory) and checks the file type (PDF, JPG, PNG).
- Process or Store: You might use OCR to read text immediately or store the file for a human agent to review later.
Example Custom Action for File Uploads:
from typing import Any, Text, Dict, List
from rasa_sdk import Action, Tracker
from rasa_sdk.executor import CollectingDispatcher
import requests
import os
class ActionSaveUserFile(Action):
def name(self) -> Text:
return "action_save_user_file"
def run(self, dispatcher: CollectingDispatcher,
tracker: Tracker,
domain: Dict[Text, Any]) -> List[Dict[Text, Any]]:
### Example: extracting file URL from a Slack payload
### Note: Structure depends on your specific connector
events = tracker.current_state()['events']
user_event = next(e for e in reversed(events) if e['event'] == 'user')
metadata = user_event.get('metadata', {})
file_url = metadata.get('file_url')
if not file_url:
dispatcher.utter_message(text="I didn't receive a file.")
return []
### Securely download the file
response = requests.get(file_url, headers={'Authorization': 'Bearer YOUR_BOT_TOKEN'})
if response.status_code == 200:
### Upload to Fast.io or S3 here instead of local disk
### filename = secure_filename(file_url.split('/')[-1])
### storage_client.upload(response.content, filename)
dispatcher.utter_message(text="File saved successfully!")
else:
dispatcher.utter_message(text="Failed to download file.")
return []
Security for Uploads
Security is important when you accept files from users. Check the file type by inspecting the file header (magic bytes), not just the extension. Bad actors can hide scripts in files named .pdf or .jpg. Use a strict list of allowed MIME types so your bot only touches safe content. Also, use a virus scanner (like ClamAV) before any file enters your internal systems.
Managing Rasa Model Artifacts
People often forget to manage the Rasa models themselves. When you run rasa train, the output is a compressed .tar.gz file containing your trained model. In production, managing these files keeps things reliable.
Model Versioning Strategy
Never overwrite your models. Name your models with timestamps and commit hashes (Rasa does this by default, e.g., 20260213-100000-stochastic-lion.tar.gz). This lets you instantly rollback to a working version if the latest model performs poorly.
Remote Model Storage
In a containerized environment (Docker/Kubernetes), your Rasa server should not rely on a local models/ directory. Instead, configure your Rasa server to load models from a remote URL or an object storage bucket (like S3 or Fast.io).
You can configure this in your deployment setup. For example, when running the Rasa server, you can point it to a remote model server or use a startup script to pull the latest "tagged" production model from storage before starting the service. This makes sure your containers always serve the correct version of your assistant.
Automated Cleanup Model files can be large (hundreds of megabytes), especially if you include heavy transformers like BERT or RoBERTa. Set up a lifecycle policy on your storage bucket to move old models to cold storage (or delete them) after a set period (e.g., 90 days). Keep only the active production model and a few recent candidates for fallback.
Persistent Storage Solutions for Rasa
Production Rasa chatbots need storage that the action server can access but that lives outside its short lifecycle.
Object Storage (S3, GCS, Fast.io) Most teams move files to cloud object storage. It offers durability, scalability, and accessibility. Fast.io works well for AI agents because it offers an MCP (Model Context Protocol) server. This allows your Rasa bot, if connected to LLM logic, to work with files using natural language rather than rigid API calls.
Database Storage vs. File Storage A common mistake is storing binary data (BLOBs) directly in the tracker store (like PostgreSQL). This fills up the database, making backups slow and conversation retrieval sluggish.
The Hybrid Approach:
- Store the File in Object Storage: Upload the actual PDF or image to Fast.io or S3.
- Store the Reference in the Tracker: Save the public URL or file path as a
slotvalue in Rasa.
This separation keeps your database fast and lightweight while your files are served via CDNs, reducing latency for users.
Fast.io Agent Storage Fast.io gives AI agents their own dedicated storage layer. The free agent tier includes 50GB of storage that your Rasa bot can access via standard APIs or MCP tools. Your bot can save user uploads, generate PDF receipts on the fly, and share them via secure public links instantly.
Security and Compliance for Chatbot Files
You are responsible for the security of any file a user uploads to your chatbot. This matters most if you handle sensitive data like IDs, financial documents, or medical records.
Data Expiry Policies Do not keep user files forever. Limit your risk with automated retention policies. For example, configure your storage bucket to delete user uploads automatically after 30 days unless you flag them for long-term retention.
Access Control Ensure that the links your bot generates for users are secure. If your bot generates a receipt, that link should not be publicly guessable. Fast.io handles this by generating secure, tokenized links for every shared file, ensuring that only the intended recipient can access the content.
Audit Logs Keep a full log of file access. You need to know exactly when a file was uploaded, which user ID uploaded it, who accessed it, and when it was deleted. You need this visibility for compliance with regulations like privacy requirements, privacy requirements, and strict security requirements.
Frequently Asked Questions
Where does Rasa store conversation data?
Rasa stores conversation history in a Tracker Store. By default, this is an in-memory store that is lost when the server restarts. For production environments, you must configure a persistent database like PostgreSQL, DynamoDB, or MongoDB in your `endpoints.yml` file to preserve conversation history.
How do I handle file uploads in Rasa?
Rasa does not natively handle file streams. You must use a Custom Action to handle file uploads. The action typically retrieves a file URL from the input channel's payload (e.g., Slack or Telegram), downloads the file securely to a temporary location, and then uploads it to a persistent storage service like Fast.io or S3.
Can Rasa read and understand PDF files?
Rasa NLU processes text, not binary files. However, you can write a Custom Action that uses Python libraries like `PyPDF2`, `pdfminer`, or OCR tools to extract text from a user-uploaded PDF. Once the text is extracted, you can set it as a slot value or feed it back into the NLU pipeline for intent classification.
What is the best way to structure Rasa training data?
For scalable projects, split your `nlu.yml` and `stories.yml` files into smaller, modular files stored in subdirectories within the `data/` folder (e.g., `data/nlu/faq.yml`, `data/nlu/chitchat.yml`). Rasa will recursively read all files in the directory during training, making teamwork and version control much easier.
How should I manage large Rasa model files?
Use a remote model storage strategy. Instead of keeping models on the local server, upload trained `.tar.gz` models to cloud storage (like Fast.io or S3). Configure your production Rasa server to pull the active model from this remote source at startup, ensuring consistent deployments across all server instances.
Related Resources
Run Manage Files In Rasa Chatbots workflows on Fast.io
Equip your chatbot with 50GB of free, persistent file storage. Handle uploads, generate reports, and share files securely with Fast.io.