How do I save my LlamaIndex index locally?

Call `index.storage_context.persist(persist_dir='./storage')` after building your index. This writes JSON files for the vector store, document store, and index store to the specified directory. To reload, use `StorageContext.from_defaults(persist_dir='./storage')` followed by `load_index_from_storage(storage_context)`.

How do I persist a LlamaIndex vector store?

For the default SimpleVectorStore, call `storage_context.persist()` to write a JSON file to disk. For managed vector databases like Pinecone or Chroma, persistence is automatic. Pass the vector store client to `StorageContext.from_defaults(vector_store=your_store)` and data is saved on each insertion.

Can I use LlamaIndex with S3 or cloud object storage?

Yes. LlamaIndex supports fsspec-compatible filesystems. You can pass an S3 filesystem object to the `persist()` method to write index files directly to a bucket. For source documents, tools like Fastio let agents access files via API without downloading them locally first.

What happens if I lose my LlamaIndex document store?

If your vector store is intact but the document store is lost, your RAG agent can still find relevant matches through similarity search, but it will not be able to retrieve the original text to send to the LLM. You would need to re-ingest your source documents to rebuild the document store. This is why backing up both stores is important.

LlamaIndex Storage: Persistence, Vector & Document Stores

Q: What is LlamaIndex StorageContext?

StorageContext is the container class that groups together the vector store, document store, index store, and graph store. It is the central interface for configuring where and how LlamaIndex persists data. You create one with `StorageContext.from_defaults()` and can pass custom store implementations for each component.

What Is LlamaIndex StorageContext?

StorageContext is the primary class for managing persistence in LlamaIndex. It acts as a container that groups together the different data stores your RAG application generates during ingestion. When you load documents and build an index, LlamaIndex creates three categories of data:

Vector Store holds the embedding vectors (arrays of floats) that represent the semantic meaning of your text chunks. These are what power similarity search.
Document Store holds the actual text content, stored as Node objects. When the LLM needs to generate an answer, it pulls the raw text from here.
Index Store holds structural metadata about how your index is organized, including references between nodes and the index configuration. There is also an optional Graph Store for knowledge graph triplets, but most RAG applications only use the first three. By default, all of these stores use in-memory implementations (SimpleVectorStore, SimpleDocumentStore, SimpleIndexStore). That means your embeddings, text chunks, and metadata all vanish when the Python process ends.

from llama_index.core import StorageContext

### Default: everything in memory
storage_context = StorageContext.from_defaults()

For anything beyond a quick prototype, you need to persist this data.

Data flow diagram showing documents moving through LlamaIndex storage layers

How to Persist LlamaIndex Data to Disk

The fast way to add persistence is writing to the local file system. This works well for single-agent setups, local development, and scripts that run on a schedule.

Saving Your Index

After building an index, call persist() on the storage context. This writes JSON files for each store type into a directory you specify.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

### Load documents and build the index
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

### Save everything to disk
index.storage_context.persist(persist_dir="./storage")

After running this, your ./storage directory will contain:

docstore.json - your text chunks and node relationships
vector_store.json - the embedding vectors
index_store.json - index structure and metadata
graph_store.json - empty unless you use a knowledge graph

Loading From Disk

To restore your index without re-ingesting, rebuild the StorageContext from those files and pass it to load_index_from_storage.

from llama_index.core import StorageContext, load_index_from_storage

### Rebuild from saved files
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

### Ready to query
query_engine = index.as_query_engine()
response = query_engine.query("What are our Q4 results?")

This skips the entire ingestion and embedding process. For a corpus with thousands of documents, that can save hours of processing time and significant API costs.

When Local Persistence Falls Short

Local JSON files work until they don't. Common pain points:

File size: A 100,000-document corpus can produce multi-gigabyte JSON files that are slow to load
Concurrency: Two agents cannot safely write to the same JSON file simultaneously
Durability: Local disk can fail, and there is no built-in backup
Deployment: Containerized agents lose local state on restart unless you mount persistent volumes

For production systems, you need dedicated database backends.

Give Your AI Agents Persistent File Storage for llamaindex storage

Fastio gives teams shared workspaces, MCP tools, and searchable file context to run llamaindex storage workflows with reliable agent and human handoffs.

Get Free Agent Storage

Vector Stores vs. Document Stores: What Each One Does

A common misconception is that the vector store is the only storage layer that matters. In practice, the document store is equally important, and losing it breaks your RAG pipeline even if your vectors are intact. Here is why both matter:

Vector Store

The vector store holds embedding vectors, the numerical representations of your text chunks. When a user sends a query, LlamaIndex converts the query into an embedding and performs a similarity search against these vectors to find relevant content. LlamaIndex supports over 20 vector database providers. The most common ones:

Chroma - open-source, runs locally, good for prototyping
Pinecone - managed service, scales well, low operational overhead
Milvus/Zilliz - high-performance, good for large-scale deployments
PostgreSQL with pgvector - add vector search to your existing Postgres database
Qdrant - open-source with managed cloud option

One important detail: many vector databases (Pinecone, Chroma, Qdrant) store both the embedding and the original text together. When you use one of these, you may not need a separate document store at all, since the vector DB handles both roles. FAISS is the notable exception. It stores only vectors, so you always need a separate document store alongside it.

Document Store

The document store preserves the actual text chunks (Node objects), their metadata, and parent-child relationships between nodes. If your vector store only holds embeddings (like FAISS), the document store is what provides the text that gets sent to the LLM for answer generation. Supported backends include MongoDB, Redis, Firestore, DynamoDB, and the default local file system.

Comparison Table

	Vector Store	Document Store	Index Store
Stores	Embedding vectors	Text chunks (Nodes)	Index metadata
Used for	Similarity search	Answer generation	Index reconstruction
Default	SimpleVectorStore (in-memory)	SimpleDocumentStore (in-memory)	SimpleIndexStore (in-memory)
Common backends	Pinecone, Chroma, pgvector	MongoDB, Redis, Firestore	MongoDB, Redis, Postgres
Can you lose it?	Expensive to regenerate (re-embed)	Text lost if source files gone	Cheap to regenerate

For many production setups, using a vector database that stores both embeddings and text (like Pinecone or Chroma) is the simplest path. You get persistence, scalability, and concurrency without managing multiple databases.

The Gap Most Guides Miss: Managing Source Files

LlamaIndex documentation covers vector stores and document stores in detail. What it does not cover is where your original files live, and how your agents access them. Most tutorials start with SimpleDirectoryReader("./data"), assuming your PDFs, CSVs, and documents are sitting in a local folder. That works for a tutorial, but production RAG systems face a different reality:

Source files are often gigabytes or terabytes in size
Multiple agents need concurrent access to the same corpus
Files come from different sources (Google Drive, client uploads, internal systems)
You need version control and audit trails for compliance
Teams need to review, update, and manage the source material alongside the AI pipeline

This is where cloud file storage fits into your LlamaIndex architecture. Your pipeline looks like this:

Source Files (cloud storage)
    -> SimpleDirectoryReader or custom loader
        -> LlamaIndex ingestion pipeline
            -> Vector Store (embeddings)
            -> Document Store (text chunks)
            -> Index Store (metadata)

The source layer sits outside LlamaIndex entirely, but it determines how reliable, scalable, and maintainable your RAG system is. Fastio handles this well. AI agents can sign up for their own accounts, create workspaces, and store source files with generous storage. Your agent reads files via API, processes them through LlamaIndex, and persists the index to your vector database. The source files stay organized in Fastio, separate from the index data. What you get over local storage or S3:

Built-in RAG: Fastio's Intelligence Mode auto-indexes files. You can query documents with natural language and get cited answers without building a separate LlamaIndex pipeline for simple use cases
URL Import: Pull files from Google Drive, OneDrive, Box, or Dropbox into a workspace. No local I/O needed
19 consolidated tools: Access files through the Model Context Protocol. Any MCP-compatible agent can interact with your file storage natively
Ownership transfer: An agent builds a workspace with source files and indexed data, then transfers ownership to a human client while keeping admin access

Production Setup: Custom Database Backends

Once you outgrow local JSON files, you configure StorageContext with database backends for each store. This gives you durability, concurrent access, and the ability to scale each layer on its own.

Example: Pinecone + MongoDB

A common production stack pairs Pinecone for vector search with MongoDB for document and index storage.

from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.storage.docstore.mongodb import MongoDocumentStore
from llama_index.storage.index_store.mongodb import MongoIndexStore
from llama_index.core import StorageContext, VectorStoreIndex

### Configure each store
vector_store = PineconeVectorStore(
    api_key="your-pinecone-key",
    index_name="my-rag-index"
)
doc_store = MongoDocumentStore.from_uri(
    uri="mongodb+srv://user:pass@cluster.mongodb.net"
)
index_store = MongoIndexStore.from_uri(
    uri="mongodb+srv://user:pass@cluster.mongodb.net"
)

### Assemble the storage context
storage_context = StorageContext.from_defaults(
    vector_store=vector_store,
    docstore=doc_store,
    index_store=index_store,
)

### Build or load an index
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
)

With this setup, your data lives in purpose-built systems. Pinecone handles similarity search. MongoDB handles document storage with rich querying. Your application can restart, scale horizontally, or move between servers without losing state.

Example: PostgreSQL with pgvector If you prefer keeping everything in one database, pgvector adds vector search capabilities to PostgreSQL.

from llama_index.vector_stores.postgres import PGVectorStore

vector_store = PGVectorStore.from_params(
    database="rag_db",
    host="localhost",
    password="your-password",
    port=5432,
    table_name="embeddings",
    embed_dim=1536,
)

storage_context = StorageContext.from_defaults(
    vector_store=vector_store
)

This simplifies operations since you only manage one database, though it may not scale as well as dedicated vector databases for large corpora.

Choosing Your Backend

Scenario	Recommended Stack
Prototyping	Local file system (default)
Single agent, small corpus	Chroma (local) or pgvector
Multi-agent, medium corpus	Pinecone + MongoDB
Enterprise, large corpus	Pinecone/Milvus + MongoDB + Fastio (source files)
Simplicity over performance	pgvector for everything

Incremental Updates and Index Management

A production RAG system is not "set and forget." Documents get added, updated, and deleted. Your storage needs to handle these changes without rebuilding the whole index.

Adding New Documents

Load the existing index from storage, insert new documents, and persist again.

from llama_index.core import StorageContext, load_index_from_storage

### Load existing index
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

### Add new documents
new_docs = SimpleDirectoryReader("./new_data").load_data()
for doc in new_docs:
    index.insert(doc)

### Persist updates
index.storage_context.persist(persist_dir="./storage")

If you use a managed vector database like Pinecone, the vectors are persisted automatically on insertion. You only need to explicitly persist the document store and index store.

Deleting Documents

To remove a document and its associated nodes:

index.delete_ref_doc("doc_id_to_remove", delete_from_docstore=True)

This removes both the vector embeddings and the source nodes. Without the delete_from_docstore=True flag, the text chunks would remain orphaned in the document store.

Handling Updates

LlamaIndex does not have a built-in "update" operation. The recommended approach is delete-then-insert:

Delete the old version using delete_ref_doc()
Insert the updated document with index.insert()
Persist the changes

For systems where source files change often, build a sync pipeline that tracks file modifications (via timestamps or checksums) and only re-processes what changed. Fastio's webhook system can trigger your ingestion pipeline when source files are uploaded or modified.

How to Manage LlamaIndex Storage for Production RAG Applications

What Is LlamaIndex StorageContext?

How to Persist LlamaIndex Data to Disk

Saving Your Index

Loading From Disk

When Local Persistence Falls Short

Give Your AI Agents Persistent File Storage for llamaindex storage

Vector Stores vs. Document Stores: What Each One Does

Vector Store

Document Store

Comparison Table

The Gap Most Guides Miss: Managing Source Files

Production Setup: Custom Database Backends

Example: Pinecone + MongoDB

Example: PostgreSQL with pgvector If you prefer keeping everything in one database, pgvector adds vector search capabilities to PostgreSQL.

Choosing Your Backend

Incremental Updates and Index Management

Adding New Documents

Deleting Documents

Handling Updates

Frequently Asked Questions

Related Resources

Give Your AI Agents Persistent File Storage for llamaindex storage