Blog

Episodic vs semantic memory for AI agents

Jonathan Bree


Cognitive science identified two distinct memory systems fifty years ago. Most AI memory products conflate them. Here is why the distinction matters and how M-1 uses both.


In 1972, psychologist Endel Tulving proposed a distinction that has shaped memory research ever since. Episodic memory is memory for events: what happened, when, and in what context. Semantic memory is memory for facts: what is true, independent of when or how it was learned. A person remembering a specific conversation they had last Tuesday is using episodic memory. A person knowing that Paris is the capital of France is using semantic memory. The two systems are distinct, complementary, and both necessary for intelligent behaviour.

Most AI memory systems are built as if only one of these exists.


Episodic memory in AI agents

Episodic memory in an AI context is the record of what happened: conversations, interactions, exchanges, events. Raw conversation logs are episodic. They capture the what and when of an interaction without distilling it into abstracted knowledge.

Systems that store episodic memory give agents access to history. You can retrieve what a user said in a past session. You can surface the context of a previous decision. You can find the conversation where a particular topic came up. The corpus is rich and complete because nothing has been discarded.

The limitation of pure episodic storage is retrieval at scale. As the corpus grows, relevant episodes become harder to surface. A question about a user's preference for concise answers requires finding the session where that preference was expressed, which may be buried in thousands of turns of unrelated conversation. Retrieval becomes a needle-in-a-haystack problem. The information is there. Finding it reliably is not guaranteed.


Semantic memory in AI agents

Semantic memory in an AI context is the distilled knowledge extracted from interactions: facts, preferences, relationships, decisions. Rather than storing raw conversation history, a semantic memory system extracts what matters and stores it in a structured, retrievable form.

Systems that store semantic memory give agents access to a clean, queryable model of what they know. User preferences are stored as facts. Decisions are stored as records. Relationships between concepts are mapped explicitly. Retrieval is fast and precise because the corpus is compact and structured.

The limitation of pure semantic extraction is information loss. The extraction process decides what to keep and discards the rest. When the extractor makes the wrong call, the discarded information is gone. More subtly, extraction at ingestion time cannot anticipate every future query. Information that seemed incidental when stored may turn out to be exactly what is needed three months later. You cannot recover what was never kept.

There is also the question of context. Extracted facts are decontextualised by design. A fact stored as "user prefers Python" has lost the conversation in which it was expressed, the reasoning behind it, the caveats that accompanied it. That context may matter for answering future questions accurately.


Why most systems pick one

Building a memory system that handles both episodic and semantic memory is harder than building one that handles either alone. Pure episodic storage is relatively straightforward: store everything, retrieve by similarity. Pure semantic extraction is also relatively straightforward: run an LLM over the conversation, extract facts, store them.

The hard part is combining them: knowing when to retrieve raw episodic context and when to retrieve extracted semantic facts, how to weight each against the other, and how to assemble both into a coherent answer. This requires a retrieval architecture that treats memory as reconstructive rather than as simple lookup.

Reconstructive memory, a concept that goes back to Bartlett's 1932 work and informs Tulving's framework, is the idea that remembering is not retrieval of a stored record. It is assembly: piecing together fragments, inference, and context into a coherent whole. Human memory does not play back recordings. It reconstructs, using both episodic fragments and semantic knowledge to produce an answer that is more than the sum of its parts.

This is the model M-1 is built on.


How M-1 uses both

M-1's architecture is explicitly informed by Tulving's episodic and semantic memory distinction, alongside Bartlett's reconstructive recall framework and temporal context models from Howard and Kahana. These are not decorative citations. They describe the actual design philosophy.

Rather than treating memory as a log to be searched or a fact database to be queried, M-1 treats retrieval as a reconstructive, multi-stage process. Candidate memories are scored using a combination of signals: semantic similarity, lexical precision, temporal salience, and importance. Queries are decomposed into parallel retrieval passes, each targeting a distinct information need. Retrieved fragments are assembled into a unified context, mirroring the reconstructive nature of episodic recall.

The multi-session reasoning category on LongMemEval illustrates why this matters. These questions cannot be answered by retrieving a single stored fact. The answer is distributed across multiple episodes from different sessions, none of which individually contains the complete picture. Answering correctly requires identifying the relevant episodes, retrieving the right fragments from each, and assembling them into a coherent semantic answer. That is reconstructive memory in practice.

M-1 scores 94% on multi-session reasoning at top-50, the hardest category on LongMemEval. Systems that rely primarily on semantic extraction score lower because they discarded the episodic context needed to reconstruct the answer. Systems that rely primarily on episodic storage score lower because they cannot synthesise across sessions efficiently. The combination is what produces the result.


What this means for agent design

The episodic and semantic distinction has practical implications for how agent memory should be designed.

An agent that only stores extracted facts will lose context, miss nuance, and fail on questions that require assembling information from multiple past interactions. An agent that only stores raw conversation logs will struggle to surface relevant information efficiently as the corpus grows and will fail on questions that require a clean, structured answer rather than a fragment of dialogue.

A production memory system needs both: the richness of episodic storage and the precision of semantic extraction, with a retrieval architecture that can draw on either depending on what the question requires.

Exabase's Memory API is built on this principle. M-1 stores and retrieves across both memory types, assembling context reconstructively rather than looking up a single stored record. The result is a memory system that handles the full range of what agents actually need to remember. See the docs to get started.

Ship your first app in minutes.