Blog

When to outgrow your framework's built-in memory

LangChain, LlamaIndex, and CrewAI all ship with memory. None of it was designed for what you are trying to do with it now.

Jonathan Bree

You started with the memory module your framework ships with. It worked. Then it didn't. Here is what is happening and what to do about it.

LangChain, LlamaIndex, CrewAI, LlamaIndex, and most other agent frameworks ship with some form of memory. For prototyping, it is exactly what you need. It is right there, it integrates in a few lines, and it makes your agent feel stateful without requiring you to think about storage architecture before you have validated the idea.

The problems start when you move toward production.

What framework memory is actually doing

Framework memory is almost always a variation on the same pattern: store recent conversation history, summarise it when it gets too long, inject it into the prompt. It is in-context memory. The history lives in the context window, not in persistent storage.

This is a reasonable solution to a specific problem: making a single conversation feel coherent. It is not a solution to long-term memory across sessions, across users, or across time. The distinction matters more than most developers expect when they first hit it.

The ceilings you will hit

No cross-session persistence by default

This is the first wall most teams hit. LangChain's ConversationBufferMemory, LlamaIndex's chat history, CrewAI's memory module: by default, none of these persist between sessions. When the session ends, the memory ends with it. The next conversation starts from zero.

There are workarounds. You can serialise memory to a database between sessions. Teams build this themselves, and it works up to a point. But you are now maintaining infrastructure that was not the reason you chose a framework in the first place, and you are solving a problem the framework was not designed to solve.

Context window limits

In-context memory has a hard ceiling. As conversation history grows, it has to be truncated or summarised to fit within the model's context window. Summarisation loses detail. Truncation loses recency or relevance depending on the strategy. Either way, information is being discarded, and there is no principled mechanism for deciding what matters.

LangChain's ConversationSummaryMemory and similar approaches are honest about this tradeoff. They are designed for single-session coherence, not for remembering that a user told you something important three weeks ago.

No multi-session synthesis

A user mentions something relevant in a conversation in January. They reference it again, obliquely, in March. Framework memory has no mechanism for connecting these. Each session is isolated. The synthesis problem, assembling relevant information from multiple past conversations to answer a current question, is precisely what LongMemEval's hardest category tests. It is also precisely what framework memory cannot do.

No contradiction resolution

A user says they prefer Python. Later they switch to TypeScript. Framework memory does not track that the earlier preference exists, has been contradicted, and should be superseded. Both statements may be summarised into the context, leaving the model to guess which is current. At scale, across many users with evolving preferences and facts, this becomes a reliability problem.

No temporal reasoning

Framework memory is largely timestamp-agnostic. It knows the order of messages within a session but has no model of time across sessions: when something was said, whether it is still current, or how to weight recency against relevance. Temporal questions, what was the latest decision on this, has anything changed since we last discussed this, are not addressable with in-context history.

Framework lock-in

Memory built on LangChain's abstractions is LangChain memory. If you switch frameworks, or want to share memory state across multiple agents built on different frameworks, you rebuild from scratch. The memory layer is coupled to the framework choice in a way that becomes expensive to unwind.

What you actually need at production scale

The LongMemEval benchmark tests six memory capabilities that matter in production: single-session recall, preference tracking, assistant-provided information, multi-session synthesis, temporal reasoning, and knowledge update. Framework memory handles the first adequately in isolation. It cannot address the rest.

What long-term agent memory actually requires is a dedicated retrieval layer that sits outside the framework: persistent across sessions, capable of synthesising across multiple conversations, able to track how facts change over time, and able to resolve contradictions rather than surface them all equally. This is a different class of infrastructure from a context window management utility.

The graduation path

Switching frameworks is not the answer and is not necessary. LangChain, LlamaIndex, and CrewAI are good at what they do. The agent orchestration, tool use, and workflow management they provide is not the problem. The memory layer underneath is.

The practical path is to keep your framework and replace the memory module with a dedicated memory API. Your agent logic stays the same. Your framework stays the same. The memory calls that currently go to an in-context buffer go instead to a persistent, production-grade memory layer that handles cross-session recall, contradiction resolution, temporal tracking, and multi-session synthesis.

Exabase works alongside any agent framework via a straightforward API. You add a memory, you search memories, you get back context that is accurate and ready for your prompt. The framework does not need to know or care what is underneath. See the Memory API or the docs to get started.

The rule of thumb is simple. Framework memory is for prototyping. When your agents need to remember things across sessions, across time, and across users, you need something built for that specifically.