How to Manage LlamaIndex Storage for Production RAG Applications
LlamaIndex storage handles the persistence of document embeddings, index metadata, and raw document nodes required for RAG applications. By default, everything lives in memory and disappears when your script exits. This guide walks through StorageContext configuration, the differences between vector stores and document stores, and how to manage the source files your pipeline depends on.
What Is LlamaIndex StorageContext?
StorageContext is the primary class for managing persistence in LlamaIndex. It acts as a container that groups together the different data stores your RAG application generates during ingestion. When you load documents and build an index, LlamaIndex creates three categories of data:
- Vector Store holds the embedding vectors (arrays of floats) that represent the semantic meaning of your text chunks. These are what power similarity search.
- Document Store holds the actual text content, stored as
Nodeobjects. When the LLM needs to generate an answer, it pulls the raw text from here. - Index Store holds structural metadata about how your index is organized, including references between nodes and the index configuration. There is also an optional Graph Store for knowledge graph triplets, but most RAG applications only use the first three. By default, all of these stores use in-memory implementations (
SimpleVectorStore,SimpleDocumentStore,SimpleIndexStore). That means your embeddings, text chunks, and metadata all vanish when the Python process ends. ```python from llama_index.core import StorageContext
Default: everything in memory
storage_context = StorageContext.from_defaults()
For anything beyond a quick prototype, you need to persist this data.
How to Persist LlamaIndex Data to Disk
The fast way to add persistence is writing to the local file system. This works well for single-agent setups, local development, and scripts that run on a schedule.
Saving Your Index
After building an index, call persist() on the storage context. This writes JSON files for each store type into a directory you specify. ```python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
Load documents and build the index
documents = SimpleDirectoryReader("./data").load_data() index = VectorStoreIndex.from_documents(documents)
Save everything to disk
index.storage_context.persist(persist_dir="./storage")
After running this, your `./storage` directory will contain:
- `docstore.json` - your text chunks and node relationships
- `vector_store.json` - the embedding vectors
- `index_store.json` - index structure and metadata
- `graph_store.json` - empty unless you use a knowledge graph
### Loading From Disk
To restore your index without re-ingesting, rebuild the `StorageContext` from those files and pass it to `load_index_from_storage`. ```python
from llama_index.core import StorageContext, load_index_from_storage
### Rebuild from saved files
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)
### Ready to query
query_engine = index.as_query_engine()
response = query_engine.query("What are our Q4 results?")
This skips the entire ingestion and embedding process. For a corpus with thousands of documents, that can save hours of processing time and significant API costs.
When Local Persistence Falls Short
Local JSON files work until they don't. Common pain points:
- File size: A 100,000-document corpus can produce multi-gigabyte JSON files that are slow to load
- Concurrency: Two agents cannot safely write to the same JSON file simultaneously
- Durability: Local disk can fail, and there is no built-in backup
- Deployment: Containerized agents lose local state on restart unless you mount persistent volumes
For production systems, you need dedicated database backends.
Give Your AI Agents Persistent File Storage for llamaindex storage
Fast.io gives teams shared workspaces, MCP tools, and searchable file context to run llamaindex storage workflows with reliable agent and human handoffs.
Vector Stores vs. Document Stores: What Each One Does
A common misconception is that the vector store is the only storage layer that matters. In practice, the document store is equally important, and losing it breaks your RAG pipeline even if your vectors are intact. Here is why both matter:
Vector Store
The vector store holds embedding vectors, the numerical representations of your text chunks. When a user sends a query, LlamaIndex converts the query into an embedding and performs a similarity search against these vectors to find relevant content. LlamaIndex supports over 20 vector database providers. The most common ones:
- Chroma - open-source, runs locally, good for prototyping
- Pinecone - managed service, scales well, low operational overhead
- Milvus/Zilliz - high-performance, good for large-scale deployments
- PostgreSQL with pgvector - add vector search to your existing Postgres database
- Qdrant - open-source with managed cloud option
One important detail: many vector databases (Pinecone, Chroma, Qdrant) store both the embedding and the original text together. When you use one of these, you may not need a separate document store at all, since the vector DB handles both roles. FAISS is the notable exception. It stores only vectors, so you always need a separate document store alongside it.
Document Store
The document store preserves the actual text chunks (Node objects), their metadata, and parent-child relationships between nodes. If your vector store only holds embeddings (like FAISS), the document store is what provides the text that gets sent to the LLM for answer generation. Supported backends include MongoDB, Redis, Firestore, DynamoDB, and the default local file system.
Comparison Table
For many production setups, using a vector database that stores both embeddings and text (like Pinecone or Chroma) is the simplest path. You get persistence, scalability, and concurrency without managing multiple databases.
The Gap Most Guides Miss: Managing Source Files
LlamaIndex documentation covers vector stores and document stores in detail. What it does not cover is where your original files live, and how your agents access them. Most tutorials start with SimpleDirectoryReader("./data"), assuming your PDFs, CSVs, and documents are sitting in a local folder. That works for a tutorial, but production RAG systems face a different reality:
- Source files are often gigabytes or terabytes in size
- Multiple agents need concurrent access to the same corpus
- Files come from different sources (Google Drive, client uploads, internal systems)
- You need version control and audit trails for compliance
- Teams need to review, update, and manage the source material alongside the AI pipeline
This is where cloud file storage fits into your LlamaIndex architecture. Your pipeline looks like this:
Source Files (cloud storage)
-> SimpleDirectoryReader or custom loader
-> LlamaIndex ingestion pipeline
-> Vector Store (embeddings)
-> Document Store (text chunks)
-> Index Store (metadata)
The source layer sits outside LlamaIndex entirely, but it determines how reliable, scalable, and maintainable your RAG system is. Fast.io handles this well. AI agents can sign up for their own accounts, create workspaces, and store source files with 50GB of free storage. Your agent reads files via API, processes them through LlamaIndex, and persists the index to your vector database. The source files stay organized in Fast.io, separate from the index data. What you get over local storage or S3:
- Built-in RAG: Fast.io's Intelligence Mode auto-indexes files. You can query documents with natural language and get cited answers without building a separate LlamaIndex pipeline for simple use cases
- URL Import: Pull files from Google Drive, OneDrive, Box, or Dropbox into a workspace. No local I/O needed
- 251 MCP tools: Access files through the Model Context Protocol. Any MCP-compatible agent can interact with your file storage natively
- Ownership transfer: An agent builds a workspace with source files and indexed data, then transfers ownership to a human client while keeping admin access
Production Setup: Custom Database Backends
Once you outgrow local JSON files, you configure StorageContext with database backends for each store. This gives you durability, concurrent access, and the ability to scale each layer on its own.
Example: Pinecone + MongoDB
A common production stack pairs Pinecone for vector search with MongoDB for document and index storage. ```python from llama_index.vector_stores.pinecone import PineconeVectorStore from llama_index.storage.docstore.mongodb import MongoDocumentStore from llama_index.storage.index_store.mongodb import MongoIndexStore from llama_index.core import StorageContext, VectorStoreIndex
Configure each store
vector_store = PineconeVectorStore( api_key="your-pinecone-key", index_name="my-rag-index" ) doc_store = MongoDocumentStore.from_uri( uri="mongodb+srv://user:pass@cluster.mongodb.net" ) index_store = MongoIndexStore.from_uri( uri="mongodb+srv://user:pass@cluster.mongodb.net" )
Assemble the storage context
storage_context = StorageContext.from_defaults( vector_store=vector_store, docstore=doc_store, index_store=index_store, )
Build or load an index
index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, )
With this setup, your data lives in purpose-built systems. Pinecone handles similarity search. MongoDB handles document storage with rich querying. Your application can restart, scale horizontally, or move between servers without losing state.
### Example: PostgreSQL with pgvector If you prefer keeping everything in one database, pgvector adds vector search capabilities to PostgreSQL. ```python
from llama_index.vector_stores.postgres import PGVectorStore
vector_store = PGVectorStore.from_params(
database="rag_db",
host="localhost",
password="your-password",
port=5432,
table_name="embeddings",
embed_dim=1536,
)
storage_context = StorageContext.from_defaults(
vector_store=vector_store
)
This simplifies operations since you only manage one database, though it may not scale as well as dedicated vector databases for large corpora.
Choosing Your Backend
Incremental Updates and Index Management
A production RAG system is not "set and forget." Documents get added, updated, and deleted. Your storage needs to handle these changes without rebuilding the whole index.
Adding New Documents
Load the existing index from storage, insert new documents, and persist again. ```python from llama_index.core import StorageContext, load_index_from_storage
Load existing index
storage_context = StorageContext.from_defaults(persist_dir="./storage") index = load_index_from_storage(storage_context)
Add new documents
new_docs = SimpleDirectoryReader("./new_data").load_data() for doc in new_docs: index.insert(doc)
Persist updates
index.storage_context.persist(persist_dir="./storage")
If you use a managed vector database like Pinecone, the vectors are persisted automatically on insertion. You only need to explicitly persist the document store and index store.
### Deleting Documents
To remove a document and its associated nodes:
```python
index.delete_ref_doc("doc_id_to_remove", delete_from_docstore=True)
This removes both the vector embeddings and the source nodes. Without the delete_from_docstore=True flag, the text chunks would remain orphaned in the document store.
Handling Updates
LlamaIndex does not have a built-in "update" operation. The recommended approach is delete-then-insert:
- Delete the old version using
delete_ref_doc() - Insert the updated document with
index.insert() - Persist the changes
For systems where source files change often, build a sync pipeline that tracks file modifications (via timestamps or checksums) and only re-processes what changed. Fast.io's webhook system can trigger your ingestion pipeline when source files are uploaded or modified.
Frequently Asked Questions
How do I save my LlamaIndex index locally?
Call `index.storage_context.persist(persist_dir='./storage')` after building your index. This writes JSON files for the vector store, document store, and index store to the specified directory. To reload, use `StorageContext.from_defaults(persist_dir='./storage')` followed by `load_index_from_storage(storage_context)`.
What is LlamaIndex StorageContext?
StorageContext is the container class that groups together the vector store, document store, index store, and graph store. It is the central interface for configuring where and how LlamaIndex persists data. You create one with `StorageContext.from_defaults()` and can pass custom store implementations for each component.
How do I persist a LlamaIndex vector store?
For the default SimpleVectorStore, call `storage_context.persist()` to write a JSON file to disk. For managed vector databases like Pinecone or Chroma, persistence is automatic. Pass the vector store client to `StorageContext.from_defaults(vector_store=your_store)` and data is saved on each insertion.
Can I use LlamaIndex with S3 or cloud object storage?
Yes. LlamaIndex supports fsspec-compatible filesystems. You can pass an S3 filesystem object to the `persist()` method to write index files directly to a bucket. For source documents, tools like Fast.io let agents access files via API without downloading them locally first.
What happens if I lose my LlamaIndex document store?
If your vector store is intact but the document store is lost, your RAG agent can still find relevant matches through similarity search, but it will not be able to retrieve the original text to send to the LLM. You would need to re-ingest your source documents to rebuild the document store. This is why backing up both stores is important.
Related Resources
Give Your AI Agents Persistent File Storage for llamaindex storage
Fast.io gives teams shared workspaces, MCP tools, and searchable file context to run llamaindex storage workflows with reliable agent and human handoffs.