Blog

What is a memory layer?

Persistence is not memory. Retrieval is not memory. Here is what a memory layer is and why the distinction matters.

Jonathan Bree

A database stores data. A RAG pipeline retrieves documents. A memory layer does something neither was designed to do: maintain evolving, structured knowledge about a specific agent's context across time.

A memory layer is infrastructure that sits between an AI agent and its stored knowledge. It handles the full lifecycle of agent memory: extracting what matters from interactions, indexing it in a way that supports retrieval, surfacing the right information at query time, and evolving the knowledge base as facts change. It is not a component. It is a system.

What a memory layer is not

A database stores and retrieves data on request. It has no understanding of what the data means, no mechanism for deciding what is relevant to a given query, and no ability to resolve contradictions or track how facts change over time. Storing conversation history in Postgres or Redis gives you persistence. It does not give you memory.

A RAG pipeline retrieves documents from a static corpus at query time. It answers questions about a knowledge base. It does not maintain state about a specific user or context, does not update when facts change, and does not synthesise across sessions. RAG and memory are complementary. They are not the same thing. See RAG vs agent memory.

Conversation history is a record of what was said in a session. Injecting it into a context window is short-term memory, not a memory layer. When the session ends, the history either disappears or sits in a database with no retrieval intelligence on top of it. Neither constitutes a memory layer.

What a memory layer actually does

A memory layer operates across three phases that distinguish it from simpler retrieval infrastructure.

Ingestion and indexing. When new information arrives, a memory layer does not simply store it. It extracts what matters, chunks and enriches the content, builds relationships between concepts and entities, resolves contradictions with existing knowledge, and updates the knowledge graph to reflect the current state of the world. This is where memory drift is prevented and entity resolution happens.

Retrieval. At query time, a memory layer does not run a single similarity search and return the top results. M-1's retrieval pipeline illustrates what this phase actually involves: queries are decomposed into parallel passes targeting distinct information needs, candidates are scored across multiple signals combining semantic similarity and lexical precision alongside temporal salience and importance scoring, and results are assembled from multiple retrieval passes rather than a single query. This is what prevents semantic collapse at scale.

Re-ranking and coherence. Retrieved candidates are not returned as a ranked list. They are evaluated for cross-memory coherence: contradictions within the retrieved set are identified and resolved, redundant memories are collapsed, and the final context is assembled in a form that is internally consistent and correctly ordered. The model receives a clean context, not a bag of similar items. This is why precise retrieval with a smaller model outperforms noisy retrieval with a larger one.

Exabase as a data layer

Exabase's Memory API implements a full memory layer: ingestion, indexing, entity resolution, contradiction resolution, hybrid retrieval, and re-ranking, managed as a single API surface. You add memories, you search memories, you get back context that is accurate and ready for your prompt.

Exabase extends this further into a complete data layer for agents. The Resources API handles document storage and retrieval for RAG workloads. Bases provide isolated cloud filesystems for agent workspaces. Memory, documents, and storage in a single system, accessible via the same API, without assembling multiple vendors.

The full picture of what M-1 does under the hood is in the research paper. To get started, see the docs.