Blog

RAG vs agent memory: What's the difference?

Both RAG and agentic memory involve embeddings, indexing, and retrieval. That is where the similarity ends.

Jonathan Bree

Developers reach for RAG because it is the retrieval paradigm they know. Then they wonder why their agent does not remember anything. Here is where the confusion comes from and how to think about it correctly.

Retrieval-augmented generation and agent memory are both retrieval problems. Both involve storing information somewhere and fetching it at query time. The similarity ends there. They are solving different problems, against different assumptions, for different reasons. Conflating them leads to architectures that do one job adequately and the other not at all.What RAG Is

RAG retrieves documents from a corpus at query time to augment a prompt. You have a knowledge base: product documentation, a legal archive, a codebase, a set of research papers. A user asks a question. The system finds the most relevant documents and includes them in the context alongside the question. The model answers from that context.

RAG assumes a static corpus. The documents exist. They are indexed. They do not change. The system's job is to find the right ones and surface them cleanly. It is fundamentally a document retrieval problem, and vector search is a natural fit for it because the goal is semantic similarity between a query and a fixed set of items.

RAG answers the question: what is in this corpus that is relevant to this query?

It is very good at that. It is not designed to do anything else.

What agent memory is

Agent memory persists learned facts, preferences, decisions, and state across sessions and evolves as new information comes in. It is not about retrieving documents from a fixed corpus. It is about tracking what the system knows about a specific user, context, or ongoing situation, and updating that knowledge as things change.

A user tells the agent their preferred coding language. A month later they switch. Agent memory tracks that the original preference existed, that it has been contradicted, and that the more recent version should supersede the earlier one. A user makes a decision in January that is relevant to a question they ask in March. Agent memory synthesises those two sessions and surfaces the connection. A user's project deadline moves. Agent memory resolves the contradiction between the old deadline and the new one before it reaches the model.

These are capabilities that RAG architectures do not attempt, because RAG assumes a static corpus. A knowledge base of documents does not update itself when a user changes their mind. It does not track contradictions between past and present states. It does not know that something said six months ago has been superseded.

Agent memory answers the question: what do I know about this user, this context, or this situation, and how has it changed?

Where they differ

Dimension	RAG	Agent Memory
Corpus	Static documents	Evolving facts and state
Knowledge source	External knowledge base	User interactions and history
Update model	Manual re-indexing	Continuous, automatic
Retrieval goal	Relevant documents	Relevant facts about a specific entity
Temporal reasoning	None	Central
Contradiction handling	None	Built in
Scope	Query-level	Cross-session, cross-time
Personalisation	None	Core purpose

Where the conflation breaks down

The conflation happens because both RAG and agent memory involve embedding, indexing, and retrieval. If you are already running a RAG pipeline, reaching for it to solve memory feels natural. The same infrastructure, a different use case.

In practice the gaps appear quickly.

A user tells your agent something about themselves. You store it as a document in your RAG corpus. Next session, they update that fact. Now you have two documents: the original and the update. Your RAG system has no mechanism to know that the second supersedes the first. Both will surface with similar confidence on relevant queries. The model sees both and has to guess.

A user makes a series of decisions across three conversations over two months. You want the agent to reason about how those decisions relate to a current question. RAG retrieves documents. It does not synthesise state across multiple sessions. You would need to retrieve all three conversations, hope they all surface, and then ask the model to connect them. This is not retrieval. It is asking the model to do the memory system's job.

A user's preferences evolve gradually. RAG treats each preference statement as an independent document. There is no profile, no accumulation, no model of who this user is over time. You get a bag of relevant statements, not a coherent understanding of a person.

LongMemEval's hardest categories, temporal reasoning, knowledge update, and multi-session synthesis, are not hard because they require a better model. They are hard because RAG architectures were not designed to address them. They require a different kind of system entirely.

How they work together

RAG and agent memory are not competing approaches. A complete agent architecture typically needs both, solving different problems in parallel.

RAG handles the knowledge base: your product documentation, your company's internal data, your domain corpus. When a user asks a question that requires external knowledge, RAG retrieves the relevant documents. This is what it was built for and it does it well.

Agent memory handles the user layer: who this person is, what they have told the system, what has changed, what decisions they have made, what context is relevant from past interactions. When a user asks a question that requires knowing something about them specifically, memory retrieves the relevant facts.

The two layers are complementary. A query might draw on both simultaneously: external knowledge from the corpus and personal context from memory. The retrieval problem is different in each case and requires a different architecture to solve it.

Exabase supports both. The Memory API handles persistent agent memory: cross-session recall, contradiction resolution, temporal tracking, and multi-session synthesis. The Resources API handles document storage and retrieval for RAG workloads. Both are accessible via the same unified data layer, so a single query can draw on document knowledge and user memory simultaneously without managing two separate retrieval systems.

The distinction matters. RAG and agent memory are different tools for different problems. Building one when you need the other produces a system that retrieves documents well and remembers nothing. See the docs to get started with both.