Comparisons

Best agent memory platforms 2026

Every major memory API compared: benchmarks, features, pricing, and who each one is actually for.

Start for free

Read the docs

Last updated May 2026

Agent memory is the infrastructure layer that determines whether an AI agent is useful once or useful over time. This page covers the leading platforms, their benchmark results, feature scope, pricing, and honest tradeoffs.

The agent memory category has matured significantly in the past year. Where previously most teams were rolling their own solutions on top of vector databases or relying on framework memory modules, there are now dedicated platforms purpose-built for the problem. Choosing between them requires understanding what each actually does, not just what each claims.

This page covers six platforms in depth and three more briefly. Where individual comparison pages exist, they are linked for deeper analysis.

How we evaluate

Benchmark accuracy is the primary technical signal. LongMemEval is the most rigorous public benchmark for conversational memory retrieval, testing six distinct memory capabilities across 115,000 tokens of conversational history. It is not a perfect benchmark but it is the closest thing the field has to a shared standard. Results are only comparable when the same model and methodology are used, which they rarely are. We note the model used alongside each score.

Feature scope covers what the platform actually does beyond core memory: document storage, graph memory, contradiction resolution, temporal reasoning, and entity resolution.

Pricing covers free tier availability, paid plan structure, and self-hosting options.

Open source matters for teams with data residency requirements, vendor lock-in concerns, or the desire to inspect and modify the codebase.

Architecture covers the retrieval approach at a high level: whether the system uses pure vector search, hybrid retrieval, graph-based retrieval, or a multi-signal approach.

LongMemEval results

System	Model	Score (top-50)
M-1 (Exabase)	Gemini 3 Flash	96.4%
Mem0	Gemini 3 Pro	94.8%
Honcho	Gemini 3 Pro	92.6%
HydraDB	Gemini 3 Pro	90.79%
Supermemory	Gemini 3 Pro	85.2%

Note that all results except M-1 used Gemini 3 Pro. A larger model can compensate for weaker retrieval by extracting correct answers from noisy context. M-1's results with a smaller, cheaper model suggest the retrieval architecture is doing more of the work. See the full research paper for complete methodology.

Supermemory claims benchmark leadership on their homepage and GitHub. These results are self-reported and have not been independently verified by a third party.

Feature matrix

Feature	Exabase	Mem0	Supermemory	Zep	Letta	Honcho
LongMemEval score	96.4%	94.8%	85.2% (claimed)	Not published	Not published	92.6%
Memory API	Yes	Yes	Yes	Yes	Yes	Yes
Document storage	Yes	No	No	No	No	No
Bases (cloud filesystems)	Yes	No	No	No	No	No
Graph memory	Yes	Pro only ($249/mo)	Yes	Yes	No	No
Contradiction resolution	Yes	Partial	Yes	Yes	No	No
Temporal reasoning	Yes	Partial	Partial	Yes	No	Partial
Entity resolution	Yes	Partial	Partial	Yes	No	No
Sub-300ms retrieval	Yes	Not published	Yes (claimed)	Not published	Not published	Not published
Open source	No	Yes	Partial	Yes	Yes	Yes
Self-hostable	No	Yes	Enterprise only	Yes	Yes	Yes
Free tier	Yes	Yes	Yes	Yes	Yes	Yes
Pricing	Scale $149/mo	Pro $249/mo	Usage-based	Contact	$20-200/mo	Contact

Platform overviews

Exabase

Exabase is a managed memory API built for production AI agents. Its core engine, M-1, implements a multi-signal retrieval pipeline combining semantic search and lexical precision with temporal salience, importance scoring, and cross-memory coherence. Rather than storing raw conversation logs and running similarity search over them, Exabase builds a dynamic knowledge graph that evolves as new information comes in. Contradictions are resolved at the storage layer. Entity resolution links fragmented references automatically. Memory drift is prevented by tracking temporal relationships explicitly.

Alongside memory, Exabase offers document storage via the Resources API and isolated cloud filesystems via Bases, making it a full data layer for agents rather than a memory-only solution.

M-1 leads LongMemEval at 96.4% using Gemini 3 Flash, a smaller and cheaper model than competitors used. The methodology is fully published and reproducible.

Strengths: benchmark-leading retrieval accuracy, full data layer including document storage and Bases, sub-300ms retrieval, contradiction and temporal resolution built in.

Limitations: no open source option, no self-hosted path, connector breadth currently less than some competitors.

Best for: production AI agents where retrieval accuracy and cost efficiency matter at scale. Teams that want a unified data layer without assembling multiple tools.

Exabase vs Mem0 · Exabase vs Supermemory

Mem0

Mem0 is the default reference point for agent memory in 2025 and 2026. With around 48,000 GitHub stars and $24 million in funding, it has the largest developer community of any standalone memory platform. It offers a three-tier memory system covering user, session, and agent scopes, backed by a hybrid store combining vectors, graph relationships, and key-value lookups.

Mem0 scores 94.8% on LongMemEval using Gemini 3 Pro. As noted in Exabase's research paper, the benchmarking script Mem0 published contained question-category-specific prompt templates designed to improve scores on particular question types. Exabase removed these and used a uniform prompt, achieving a higher score with a cheaper model.

Graph memory is available on the Pro plan at $249 per month. The core product is open source and self-hostable.

Strengths: largest community, broadest framework integrations, open source, self-hostable, AWS Strands native integration.

Limitations: graph memory paywalled at $249/mo, no document storage or filesystem layer, temporal reasoning partial.

Best for: teams that weight open source, community ecosystem, and framework integration breadth. Early-stage projects benefiting from a large body of existing examples and integrations.

Full comparison: Exabase vs Mem0

Supermemory

Supermemory is a relatively new managed memory API with a generous free tier and strong MCP integrations for Claude Code, Cursor, and Windsurf. It claims benchmark leadership on LongMemEval, LoCoMo, and ConvoMem on its homepage. These results are self-reported and have not been independently verified. Third-party evaluations have returned materially lower scores than Supermemory's published figures.

On LongMemEval, Supermemory scores 85.2% using Gemini 3 Pro, the largest gap from M-1 of any system in the comparative table. An 11-point deficit with a more expensive model points to fundamental differences in retrieval architecture rather than a tuning gap. See why vector search breaks at scale for the architectural reasons behind this kind of gap.

Supermemory offers connectors for Google Drive, Gmail, Notion, and GitHub, which is broader connector coverage than most competitors. Self-hosting requires an enterprise agreement.

Strengths: generous free tier, MCP-native integrations for coding agents, broad connectors, fast setup.

Limitations: largest benchmark gap in the field, self-reported results not independently verified, self-hosting enterprise only, closed source core.

Best for: early-stage projects and coding agent workflows where MCP integrations and speed of setup matter more than benchmark accuracy.

Full comparison: Exabase vs Supermemory

Zep

Zep is a memory platform built around a temporal knowledge graph called Graphiti. Where most memory systems treat facts as timestamped snapshots, Zep's graph stores fact validity windows: it knows not just when something was said but for how long it remained true. This makes it particularly strong on temporal reasoning tasks.

Zep is open source with a self-hosted community edition and a managed cloud offering. It has published strong results on temporal sub-tasks and is a credible option for agent workflows where temporal reasoning is the primary requirement.

Strengths: temporal knowledge graph architecture, open source, self-hostable, strong on temporal reasoning specifically.

Limitations: no published overall LongMemEval score, no document storage layer, advanced features cloud-only.

Best for: agents with complex temporal reasoning requirements, teams that want open source with a managed cloud option.

Letta

Letta is an agent framework with a tiered memory model inspired by operating system memory architecture. The agent manages its own memory tiers, deciding what stays in working memory versus long-term storage. This makes memory management transparent and customisable at the agent level.

Letta is open source and well-suited to teams building on LangGraph or wanting LLM-driven memory management. It is less suited to teams that want a drop-in memory API without modifying agent architecture.

Strengths: open source, LLM-driven memory management, tiered memory model, active development.

Limitations: no published LongMemEval results, requires LLM for all memory operations, memory decisions inherit LLM opacity, limited outside LangGraph.

Best for: teams already committed to LangGraph who want transparent, customisable memory management at the agent level.

Honcho

Honcho focuses on implicit preference learning: inferring what users want from how they behave rather than requiring explicit instructions. It scored 92.6% on LongMemEval using Gemini 3 Pro, the third highest published result after M-1 and Mem0.

Honcho is open source and positioned toward personalisation use cases where learning user patterns implicitly is more valuable than explicit memory storage.

Strengths: implicit preference learning, solid benchmark result, open source.

Limitations: narrower feature scope than broader platforms, less community and integration breadth than Mem0.

Best for: personalisation agents where implicit preference learning is the primary requirement.

A note on Pinecone and vector databases

Pinecone, Weaviate, Qdrant, and pgvector are retrieval infrastructure, not memory platforms. They store embeddings and retrieve by similarity. Using a vector database directly for agent memory means building everything else yourself: query decomposition, temporal salience, contradiction resolution, entity resolution, importance scoring, and cross-memory coherence. These are non-trivial engineering problems that a dedicated memory platform handles out of the box. See why a vector database is not a memory system for a full treatment.

A note on LangChain Memory and LlamaIndex Memory

Framework memory modules handle short-term, in-session memory. They are not persistent memory systems. When the session ends, the memory ends. They are the right starting point for prototyping and the wrong infrastructure for production agents that need to remember things across sessions and over time. See when to outgrow your framework's built-in memory.

Pricing comparison

Platform	Free tier	Paid starts at	Self-hosted
Exabase	Yes	$149/mo	No
Mem0	Yes	$249/mo (Pro)	Yes (open source)
Supermemory	Yes (generous)	Usage-based	Enterprise only
Zep	Yes	Contact	Yes (community edition)
Letta	Yes	$20/mo	Yes (open source)
Honcho	Yes	Contact	Yes (open source)

How to choose

If retrieval accuracy at production scale is the primary requirement: Exabase. The benchmark gap between M-1 and the next best system is meaningful at scale, and it was achieved with a cheaper model, which changes the cost structure at volume.

If open source and self-hosting are hard requirements: Mem0 for the largest community and broadest integrations, Zep if temporal reasoning is important, Letta if you want LLM-managed memory within LangGraph, Honcho if implicit preference learning is the priority.

If you are early stage and want the fastest path to working memory: Supermemory for MCP-native coding agent workflows, Mem0 for the largest body of examples and community knowledge.

If you need memory plus document storage and filesystems: Exabase is currently the only platform that offers all three in a unified data layer.

If you are currently using a vector database for memory: read why a vector database is not a memory system first. The engineering cost of building the missing layers yourself is almost always higher than teams expect.

If you are currently using framework memory: read when to outgrow your framework's built-in memory. The session boundary is the wall most teams hit first.

Ship your first app in minutes.

Start for free

Read the docs

Ship your first app in minutes.

Start for free

Read the docs

Ship your first app in minutes.

Start for free

Read the docs

Best agent memory platforms 2026

How we evaluate

LongMemEval results

Feature matrix

Platform overviews

Exabase

Mem0

Supermemory

Zep

Letta

Honcho

A note on Pinecone and vector databases

A note on LangChain Memory and LlamaIndex Memory

Pricing comparison

How to choose

Further reading

Ship your first app in minutes.

Ship your first app in minutes.

Ship your first app in minutes.

Part of the family:

Part of the family:

Part of the family: