Blog

Why a vector database is not a memory system

Pinecone, Weaviate, Qdrant, and pgvector give you basic semantic search. Here is everything else a production memory system requires.

Jonathan Bree

Pinecone, Weaviate, Qdrant, and pgvector are excellent tools. They solve one part of the agent memory problem and leave the rest to you. Here is what the rest looks like.

When developers first need to give an AI agent persistent memory, vector search is the natural starting point. You have documents or conversation history, you embed them, you store the vectors, you retrieve by similarity at query time. It works. The first demo is convincing. Then production arrives and the gaps become visible one by one.

This is not a criticism of vector databases. Pinecone, Weaviate, Qdrant, and pgvector are well-engineered tools that do exactly what they say. The problem is not what they do. It is what they do not do, and how much engineering sits between a vector database and a system that genuinely remembers.

The Retrieval Problem

A vector database gives you semantic similarity search. You embed a query, find the nearest vectors, return the top results. This is Sₛₑₘ in M-1's retrieval scoring: one signal among several, and the most straightforward one to implement.

What it does not give you is query decomposition. A user asking "what did we decide about the API design and how does that affect the deadline we discussed last month" is asking two related questions that draw on different parts of memory. A single vector query will surface whatever is most similar to the combined query string, which is unlikely to retrieve both fragments cleanly. A memory system decomposes that query into parallel retrieval passes, each targeting a distinct information need, then assembles the results. This is critical for multi-session questions, where the answer is distributed across multiple conversations rather than sitting in one place.

There is also the question of lexical precision. Semantic similarity finds things that mean similar things. It does not always find things that contain specific terms, names, identifiers, or exact phrases that matter for a given query. A production memory system combines semantic search with lexical matching so that exact relevance can override approximate similarity when the situation calls for it. Vector search alone cannot make that judgement.

The Time Problem

Vector databases store embeddings. Embeddings do not have an inherent relationship with time. A fact stored six months ago sits in the same index as a fact stored yesterday, retrievable with equal ease, surfaced with equal confidence.

This creates two problems in production.

The first is staleness. When facts change, the old version remains in the store. Query for anything related to that fact and both versions may surface, with no signal about which is current. A memory system applies temporal salience: recent information is more accessible, older information of high relevance persists, and the interaction between recency and relevance is modelled explicitly rather than ignored.

The second is contradiction. A user tells the agent their project deadline is end of Q3. Three months later it moves to Q1. A vector store surfaces both statements with similar confidence scores because both are semantically similar to queries about deadlines. A memory system detects the contradiction, identifies which statement is more recent, and resolves it before the context reaches the model. This is not a feature you get from a vector database. It is a layer you build on top of it, or one that Exabase provides out of the box.

Temporal reasoning is one of LongMemEval's six test categories and one of the hardest. Systems that rely primarily on vector search consistently underperform on it regardless of model size.

The Consistency Problem

Even when retrieval surfaces the right memories individually, there is no guarantee they form a coherent context when assembled together. Two retrieved fragments may be semantically relevant to the query but factually inconsistent with each other. A third may be relevant but redundant, adding noise rather than signal. A fourth may assume context that is not present in the others.

A vector database returns a ranked list of similar items. It has no mechanism for evaluating whether those items are consistent with each other, whether they collectively tell a coherent story, or whether the assembled context will lead the model toward a correct answer or a confused one.

A memory system adds cross-memory coherence: retrieved candidates are evaluated not just for individual relevance but for how they relate to each other. Contradictions within the retrieved set are identified and resolved. Redundant memories are collapsed. The final context presented to the model is internally consistent and correctly ordered, not just a bag of similar items.

This is also where importance scoring comes in. Not all memories are equally relevant to a given query. Some facts are foundational. Others are incidental. A system that weights memories equally will surface noise alongside signal. Importance scoring, informed by how often a memory has been retrieved, how central it is to the user's history, and how directly it bears on the current query, filters that noise before it reaches the model.

What's Needed to Build a Memory System

To turn a vector database into a real memory system, you would need to build, on top of Pinecone or Weaviate or Qdrant or pgvector:

Query decomposition for multi-part and multi-session questions. Hybrid retrieval combining semantic similarity with lexical precision. Temporal salience so recency interacts correctly with relevance. Contradiction detection across stored facts. Knowledge update so new facts supersede old ones rather than coexisting with them. Entity resolution so fragmented references to the same concept link together. Importance scoring so not all memories are weighted equally. Cross-memory coherence so retrieved context is internally consistent. Re-ranking to assemble a clean final context from the retrieved candidates.

Each of these is a non-trivial engineering problem. Together they constitute the difference between a retrieval index and a memory system.

M-1's retrieval score for a given memory can be characterised as combining semantic similarity, lexical precision, temporal salience, importance scoring, and cross-memory coherence, with query decomposition running upstream and re-ranking running downstream. A vector database gives you one of those signals. The rest is what a memory layer adds.

The LongMemEval results make this concrete. Systems that pair vector search with large frontier models score 85 to 94 percent. M-1, using a multi-signal architecture with a smaller and cheaper model, scores 96.4 percent. The gap is not explained by model power. It is explained by retrieval architecture.

The Build vs. Use Decision

You can build all of this yourself on top of a vector database. Teams do. It takes significant time, requires ongoing maintenance, and the engineering involved is largely undifferentiated: it does not make your product better, it just makes your memory layer adequate.

Exabase provides all of it out of the box. The Memory API handles query decomposition, hybrid retrieval, temporal salience, contradiction resolution, importance scoring, cross-memory coherence, and re-ranking. You make API calls. You get back context that is accurate, consistent, and ready for your prompt.

Vector databases are a foundation. Exabase is a complete solution. The question is how much of that foundation you want to lay yourself. See the docs to get started.