AI & Agents

How to Build Persistent Memory for Semantic Kernel Agents

Semantic Kernel memory lets AI agents store and retrieve information using vector embeddings. This guide covers setting up memory stores in C# and Python, moving past the built-in VolatileMemoryStore to production-ready persistence, and connecting a remote file system for large asset retrieval in RAG pipelines.

Fastio Editorial Team 9 min read
AI-powered memory and storage visualization

What Is Semantic Kernel Memory?

Semantic Kernel memory is a set of abstractions, developed by Microsoft, that let AI agents store and retrieve information using vector embeddings. Instead of relying on raw keyword search, agents encode text into high-dimensional vectors and find related content by similarity. The memory system sits between your agent logic and the underlying data store. You write to a common interface, and the SDK handles embedding generation, storage, and retrieval behind the scenes. Swap storage backends (in-memory, Redis, Azure AI Search, PostgreSQL) without rewriting agent code. Semantic Kernel ships as an open-source SDK for C#, Python, and Java. The memory layer is one part of a broader toolkit that includes planners, plugins, and prompt management. For agents that need to recall past conversations, reference documents, or maintain state across sessions, the memory system is the foundation. The key components:

  • IMemoryStore / MemoryStore: The interface your code talks to. Handles collection creation, record upsert, and similarity search.
  • Embedding generators: Convert text to vectors using models like OpenAI's text-embedding-ada-002 or local alternatives.
  • Vector store connectors: Plug in to databases like Pinecone, Qdrant, Azure AI Search, Redis, and others.
  • TextMemoryPlugin: A built-in plugin that wires memory directly into your agent's prompt pipeline.
Neural network index visualization for semantic memory

Semantic Memory vs. Kernel Memory Service

Microsoft's ecosystem has two memory-related projects with similar names. Knowing the difference saves you from picking the wrong tool.

Semantic Memory (The SDK Component)

This is the lightweight, in-process library included directly in the Semantic Kernel SDK. It provides a simple abstraction for saving and retrieving text embeddings. Works well for applications where you want to manually manage the chunking and storage of small text snippets.

Kernel Memory (The Service)

Kernel Memory (KM) is a separate, standalone open-source service built on top of Semantic Kernel. It handles the full document ingestion pipeline:

  • Data Ingestion: Upload PDF, Word, Excel, or Markdown files.
  • Pipeline Processing: Automatically parses, chunks, and embeds documents.
  • Storage: Saves vectors to a database (Qdrant, Azure AI Search) and raw files to blob storage.
  • Retrieval: Offers advanced search with citation generation and hybrid search. For document-heavy RAG scenarios where you need to process many file types at scale, Kernel Memory is the better fit. For simpler use cases where you control the text input, the SDK's built-in Semantic Memory works fine.

Step-by-Step: Initializing Memory in C# and Python

Start by initializing a memory store. Below is how to set up VolatileMemoryStore (for testing) and a persistent vector store.

C# Example (using Qdrant)

In C#, use the MemoryBuilder to configure your backing store and embedding model.

using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Connectors.Qdrant;
using Microsoft.SemanticKernel.Connectors.OpenAI;

// Initialize the memory builder
var memory = new MemoryBuilder()
    .WithOpenAITextEmbeddingGeneration("text-embedding-3-small", "your-api-key")
    .WithQdrantMemoryStore("http://localhost:6333", 1536)
    .Build();

// Save a piece of information
await memory.SaveReferenceAsync(
    collection: "employee_handbook",
    externalId: "info1",
    text: "Employees are entitled to 4 weeks of paid vacation.",
    description: "Vacation Policy"
);

// Search memory
var results = memory.SearchAsync("employee_handbook", "How much time off do I get?");

await foreach (var result in results)
{
    Console.WriteLine($"Answer: {result.Metadata.Text}");
}

Python Example (using VolatileMemoryStore)

In Python, the setup uses the SemanticTextMemory class directly for simple use cases.

import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import OpenAITextEmbedding
from semantic_kernel.memory import SemanticTextMemory, VolatileMemoryStore

async def main():
    kernel = sk.Kernel()

### Configure embedding service
    embedding_gen = OpenAITextEmbedding(
        "text-embedding-3-small", "your-api-key"
    )

### VolatileMemoryStore for temporary, in-memory testing
    memory = SemanticTextMemory(
        storage=VolatileMemoryStore(),
        embeddings_generator=embedding_gen
    )

### Save info
    await memory.save_information(
        collection="history",
        id="info1",
        text="The project deadline is October 15th."
    )

### Retrieve info
    result = await memory.search("history", "When is the project due?")
    print(f"Found: {result[0].text}")

import asyncio
asyncio.run(main())

VolatileMemoryStore is fine for development. In production, you need something that survives restarts.

AI-powered document analysis and retrieval interface

Moving to Persistent Vector Store Connectors

Semantic Kernel's vector store abstraction (the newer replacement for the legacy memory store interface) supports multiple backends. Each connector implements the same IVectorStore interface, so switching databases is a configuration change, not a rewrite.

Available Connectors

  • Azure AI Search: Full-text and vector hybrid search. Good for teams already on Azure.
  • Redis: Fast, widely deployed. Works well for agents that need low-latency retrieval.
  • PostgreSQL (pgvector): Add vector search to your existing Postgres database. No new infrastructure.
  • Pinecone: Managed vector database. Scales without ops overhead.
  • Qdrant: Open-source, self-hostable. Strong filtering and payload support.
  • MongoDB Atlas: Vector search built into your document database.
  • SQL Server: For teams with existing SQL Server deployments.
  • Faiss: Facebook's similarity search library for local, high-performance indexing.

Example: Swapping to Redis in C#

using Microsoft.SemanticKernel.Connectors.Redis;

// Replace VolatileMemoryStore with Redis
var memoryStore = new RedisMemoryStore("localhost:6379");

var memory = new MemoryBuilder()
    .WithOpenAITextEmbeddingGeneration("text-embedding-3-small", apiKey)
    .WithMemoryStore(memoryStore)
    .Build();

// Same API as before
await memory.SaveInformationAsync(
    collection: "agent-knowledge",
    id: "fact-42",
    text: "Customer onboarding takes an average of 3 business days."
);

Choosing a Backend

Pick your connector based on what you already run:

  • Already on Azure? Use Azure AI Search for native integration.
  • Need speed above all? Redis or Faiss.
  • Want minimal infrastructure? pgvector on your existing Postgres.
  • Fully managed? Pinecone or MongoDB Atlas. The vector store abstraction is still in preview as of early 2026, but Microsoft has signaled it will replace the legacy IMemoryStore interface.
Fastio features

Give Your AI Agents Persistent Storage

Fastio gives teams shared workspaces, MCP tools, and searchable file context to run semantic kernel memory workflows with reliable agent and human handoffs.

The Large Asset Problem in RAG Pipelines

Standard RAG pipelines work well for text. Chunk a PDF into smaller segments, embed them, store the embeddings in a vector database. Done. But production agents often need to handle more than text. Consider these scenarios:

  • Video archives: You have terabytes of footage. You can embed the transcripts, but the agent might need the original video file to extract a clip or analyze a frame.
  • CAD models: Engineering teams need agents to find specific blueprints. The vector store holds the description ("Pump Assembly v2"), but the actual file is a 500MB proprietary format.
  • Generated reports: An agent produces a 15MB PDF. The embedding of the text goes into the vector store, but the PDF itself needs to live somewhere accessible. Vector databases aren't built for storing large binary files. They store embeddings and metadata. The source files need a separate home.

A Hybrid Architecture

Split your storage into two layers:

  • Vector database (Qdrant, Pinecone, etc.): Stores embeddings, text chunks, and metadata including a file reference ID.
  • File store (cloud storage with API access): Stores the actual documents, videos, and artifacts.
User Query
    |
    v
Semantic Kernel Agent
    |
    +--> Vector Store (embeddings + metadata)
    |        Returns: matching chunks + file reference IDs
    |
    +--> File Store API (source documents)
    |        Returns: full document for deep context
    |
    v
LLM generates answer with citations

If you're building RAG pipelines that need persistent file storage, Fastio provides an API-first file store. Agents sign up for their own accounts, create workspaces to organize files by project, and upload or download via REST API or MCP server. The built-in Intelligence Mode handles RAG indexing on its own, so you can skip manual chunking and embedding for many use cases.

Best Practices for Production Memory Systems

Tutorials make Semantic Kernel memory look simple. Production is a different story. Here are patterns from teams that have actually shipped agent memory systems.

Separate Hot and Cold Storage

Not all memories need the same retrieval speed. Recent conversation context should be in a fast store (Redis, in-memory). Historical knowledge can live in a cheaper, higher-capacity store (PostgreSQL, cloud file storage).

Hot:  Last 24 hours of interactions  -->  Redis / VolatileMemoryStore
Warm: Project knowledge base          -->  Azure AI Search / Pinecone
Cold: Archived reports and files      -->  Cloud file storage (S3, Fastio)

Chunking Strategy Matters

Don't just split by character count. Use semantic chunking: split by paragraphs or markdown headers so each chunk contains a complete thought. Bad chunking is the number one cause of poor retrieval quality.

Enable Hybrid Search

Pure semantic search (vectors) can miss exact keywords like part numbers, error codes, or product SKUs. Enable hybrid search that combines vector similarity with keyword matching (BM25) for best results. Azure AI Search and Qdrant both support this.

Use Metadata Filtering

Tag your memories with categories. If a user asks about "HR policies," filter by category == 'HR' before running the vector search. This reduces the search space and improves accuracy.

Version Your Collections

When you update your embedding model or chunking strategy, old embeddings become incompatible. Use versioned collection names (docs-v2, docs-v3) and migrate gradually instead of reindexing everything at once.

Handle Multi-Agent Conflicts

If multiple agents write to the same memory store, you need coordination. Use file locks to prevent two agents from updating the same record simultaneously. This matters most for shared knowledge bases.

Frequently Asked Questions

How does Semantic Kernel handle memory?

Semantic Kernel handles memory through an abstraction layer that connects your application to a vector database. It converts text into numerical embeddings using an embedding model (like OpenAI's text-embedding-3-small) and stores them alongside the original text. When queried, it converts your question into an embedding and uses cosine similarity to find the most relevant stored information. You can plug in different storage backends (Redis, Qdrant, Azure AI Search, PostgreSQL) without changing your agent code.

What is the VolatileMemoryStore in Semantic Kernel?

VolatileMemoryStore is a temporary, in-memory implementation of the memory interface provided by Semantic Kernel. It stores data in process memory, making it fast but ephemeral. All data is lost when the application stops or restarts. It is designed for development, testing, and demos, not production use. Because it implements the standard IMemoryStore interface, you can develop with VolatileMemoryStore and swap in a persistent backend (like Redis or PostgreSQL) when you deploy.

How do I add long-term memory to Semantic Kernel agents?

To add long-term memory, replace VolatileMemoryStore with a persistent vector store connector. Semantic Kernel supports Redis, Qdrant, Pinecone, Azure AI Search, PostgreSQL with pgvector, MongoDB Atlas, and others. Configure the connector during kernel initialization, and the rest of your code stays the same. For file-based memory (PDFs, images, reports), pair your vector store with a cloud file storage service that provides API access, since vector databases only store embeddings, not the source files themselves.

Can Semantic Kernel memory work with any LLM?

Yes. Semantic Kernel's memory system is LLM-agnostic. The SDK supports OpenAI, Azure OpenAI, Hugging Face, and local models for both embeddings and completions. Your memory store is independent of your LLM choice, so you can switch models without re-architecting your memory layer.

What is the difference between Semantic Kernel Memory and Kernel Memory?

Semantic Kernel Memory is the built-in memory abstraction within the Semantic Kernel SDK for storing and retrieving embeddings. Kernel Memory is a separate Microsoft project that provides a full document ingestion pipeline with automatic parsing, chunking, and embedding of PDFs, Word docs, and other file types. Kernel Memory is better suited for document-heavy RAG scenarios at scale. You can use them together, with Kernel Memory handling ingestion and Semantic Kernel handling agent logic.

Related Resources

Fastio features

Give Your AI Agents Persistent Storage

Fastio gives teams shared workspaces, MCP tools, and searchable file context to run semantic kernel memory workflows with reliable agent and human handoffs.