Comparisons

Best agent memory platforms 2026

Every major agent memory platform compared: benchmarks, features, pricing, and who each one is actually for.

Last updated May 2026

Agent memory is the infrastructure layer that determines whether an AI agent is useful once or useful over time. This page covers the leading platforms, their benchmark results, feature scope, pricing, and honest tradeoffs.


The agent memory category has matured significantly in the past year. Where previously most teams were rolling their own solutions on top of vector databases or relying on framework memory modules, there are now dedicated platforms purpose-built for the problem. Choosing between them requires understanding what each actually does, not just what each claims.

This page covers six platforms in depth and three more briefly. Where individual comparison pages exist, they are linked for deeper analysis.


How we evaluate

Benchmark accuracy is the primary technical signal. LongMemEval is the most rigorous public benchmark for conversational memory retrieval, testing six distinct memory capabilities across 115,000 tokens of conversational history. It is not a perfect benchmark but it is the closest thing the field has to a shared standard. Results are only comparable when the same model and methodology are used, which they rarely are. We note the model used alongside each score.

Feature scope covers what the platform actually does beyond core memory: document storage, graph memory, contradiction resolution, temporal reasoning, and entity resolution.

Pricing covers free tier availability, paid plan structure, and self-hosting options.

Open source matters for teams with data residency requirements, vendor lock-in concerns, or the desire to inspect and modify the codebase.

Architecture covers the retrieval approach at a high level: whether the system uses pure vector search, hybrid retrieval, graph-based retrieval, or a multi-signal approach.


LongMemEval results

System

Model

Score (top-50)

M-1 (Exabase)

Gemini 3 Flash

96.4%

Mem0

Gemini 3 Pro

94.8%

Honcho

Gemini 3 Pro

92.6%

HydraDB

Gemini 3 Pro

90.79%

Supermemory

Gemini 3 Pro

85.2%

Note that all results except M-1 used Gemini 3 Pro. A larger model can compensate for weaker retrieval by extracting correct answers from noisy context. M-1's results with a smaller, cheaper model suggest the retrieval architecture is doing more of the work. See the full research paper for complete methodology.

Supermemory claims benchmark leadership on their homepage and GitHub. These results are self-reported and have not been independently verified by a third party.


Feature matrix

Feature

Exabase

Mem0

Supermemory

Zep

Letta

Honcho

LongMemEval score

96.4%

94.8%

85.2% (claimed)

Not published

Not published

92.6%

Memory API

Yes

Yes

Yes

Yes

Yes

Yes

Document storage

Yes

No

No

No

No

No

Bases (cloud filesystems)

Yes

No

No

No

No

No

Graph memory

Yes

Pro only ($249/mo)

Yes

Yes

No

No

Contradiction resolution

Yes

Partial

Yes

Yes

No

No

Temporal reasoning

Yes

Partial

Partial

Yes

No

Partial

Entity resolution

Yes

Partial

Partial

Yes

No

No

Sub-300ms retrieval

Yes

Not published

Yes (claimed)

Not published

Not published

Not published

Open source

No

Yes

Partial

Yes

Yes

Yes

Self-hostable

No

Yes

Enterprise only

Yes

Yes

Yes

Free tier

Yes

Yes

Yes

Yes

Yes

Yes

Pricing

Scale $149/mo

Pro $249/mo

Usage-based

Contact

$20-200/mo

Contact


Platform overviews

Exabase

Exabase is a managed memory API built for production AI agents. Its core engine, M-1, implements a multi-signal retrieval pipeline combining semantic search and lexical precision with temporal salience, importance scoring, and cross-memory coherence. Rather than storing raw conversation logs and running similarity search over them, Exabase builds a dynamic knowledge graph that evolves as new information comes in. Contradictions are resolved at the storage layer. Entity resolution links fragmented references automatically. Memory drift is prevented by tracking temporal relationships explicitly.

Alongside memory, Exabase offers document storage via the Resources API and isolated cloud filesystems via Bases, making it a full data layer for agents rather than a memory-only solution.

M-1 leads LongMemEval at 96.4% using Gemini 3 Flash, a smaller and cheaper model than competitors used. The methodology is fully published and reproducible.

Strengths: benchmark-leading retrieval accuracy, full data layer including document storage and Bases, sub-300ms retrieval, contradiction and temporal resolution built in.

Limitations: no open source option, no self-hosted path, connector breadth currently less than some competitors.

Best for: production AI agents where retrieval accuracy and cost efficiency matter at scale. Teams that want a unified data layer without assembling multiple tools.

Exabase vs Mem0 · Exabase vs Supermemory


Mem0

Mem0 is the default reference point for agent memory in 2025 and 2026. With around 48,000 GitHub stars and $24 million in funding, it has the largest developer community of any standalone memory platform. It offers a three-tier memory system covering user, session, and agent scopes, backed by a hybrid store combining vectors, graph relationships, and key-value lookups.

Mem0 scores 94.8% on LongMemEval using Gemini 3 Pro. As noted in Exabase's research paper, the benchmarking script Mem0 published contained question-category-specific prompt templates designed to improve scores on particular question types. Exabase removed these and used a uniform prompt, achieving a higher score with a cheaper model.

Graph memory is available on the Pro plan at $249 per month. The core product is open source and self-hostable.

Strengths: largest community, broadest framework integrations, open source, self-hostable, AWS Strands native integration.

Limitations: graph memory paywalled at $249/mo, no document storage or filesystem layer, temporal reasoning partial.

Best for: teams that weight open source, community ecosystem, and framework integration breadth. Early-stage projects benefiting from a large body of existing examples and integrations.

Full comparison: Exabase vs Mem0


Supermemory

Supermemory is a relatively new managed memory API with a generous free tier and strong MCP integrations for Claude Code, Cursor, and Windsurf. It claims benchmark leadership on LongMemEval, LoCoMo, and ConvoMem on its homepage. These results are self-reported and have not been independently verified. Third-party evaluations have returned materially lower scores than Supermemory's published figures.

On LongMemEval, Supermemory scores 85.2% using Gemini 3 Pro, the largest gap from M-1 of any system in the comparative table. An 11-point deficit with a more expensive model points to fundamental differences in retrieval architecture rather than a tuning gap. See why vector search breaks at scale for the architectural reasons behind this kind of gap.

Supermemory offers connectors for Google Drive, Gmail, Notion, and GitHub, which is broader connector coverage than most competitors. Self-hosting requires an enterprise agreement.

Strengths: generous free tier, MCP-native integrations for coding agents, broad connectors, fast setup.

Limitations: largest benchmark gap in the field, self-reported results not independently verified, self-hosting enterprise only, closed source core.

Best for: early-stage projects and coding agent workflows where MCP integrations and speed of setup matter more than benchmark accuracy.

Full comparison: Exabase vs Supermemory

Zep

Zep is a memory platform built around a temporal knowledge graph called Graphiti. Where most memory systems treat facts as timestamped snapshots, Zep's graph stores fact validity windows: it knows not just when something was said but for how long it remained true. This makes it particularly strong on temporal reasoning tasks.

Zep is open source with a self-hosted community edition and a managed cloud offering. It has published strong results on temporal sub-tasks and is a credible option for agent workflows where temporal reasoning is the primary requirement.

Strengths: temporal knowledge graph architecture, open source, self-hostable, strong on temporal reasoning specifically.

Limitations: no published overall LongMemEval score, no document storage layer, advanced features cloud-only.

Best for: agents with complex temporal reasoning requirements, teams that want open source with a managed cloud option.


Letta

Letta is an agent framework with a tiered memory model inspired by operating system memory architecture. The agent manages its own memory tiers, deciding what stays in working memory versus long-term storage. This makes memory management transparent and customisable at the agent level.

Letta is open source and well-suited to teams building on LangGraph or wanting LLM-driven memory management. It is less suited to teams that want a drop-in memory API without modifying agent architecture.

Strengths: open source, LLM-driven memory management, tiered memory model, active development.

Limitations: no published LongMemEval results, requires LLM for all memory operations, memory decisions inherit LLM opacity, limited outside LangGraph.

Best for: teams already committed to LangGraph who want transparent, customisable memory management at the agent level.


Honcho

Honcho focuses on implicit preference learning: inferring what users want from how they behave rather than requiring explicit instructions. It scored 92.6% on LongMemEval using Gemini 3 Pro, the third highest published result after M-1 and Mem0.

Honcho is open source and positioned toward personalisation use cases where learning user patterns implicitly is more valuable than explicit memory storage.

Strengths: implicit preference learning, solid benchmark result, open source.

Limitations: narrower feature scope than broader platforms, less community and integration breadth than Mem0.

Best for: personalisation agents where implicit preference learning is the primary requirement.


A note on Pinecone and vector databases

Pinecone, Weaviate, Qdrant, and pgvector are retrieval infrastructure, not memory platforms. They store embeddings and retrieve by similarity. Using a vector database directly for agent memory means building everything else yourself: query decomposition, temporal salience, contradiction resolution, entity resolution, importance scoring, and cross-memory coherence. These are non-trivial engineering problems that a dedicated memory platform handles out of the box. See why a vector database is not a memory system for a full treatment.


A note on LangChain Memory and LlamaIndex Memory

Framework memory modules handle short-term, in-session memory. They are not persistent memory systems. When the session ends, the memory ends. They are the right starting point for prototyping and the wrong infrastructure for production agents that need to remember things across sessions and over time. See when to outgrow your framework's built-in memory.


Pricing comparison

Platform

Free tier

Paid starts at

Self-hosted

Exabase

Yes

$149/mo

No

Mem0

Yes

$249/mo (Pro)

Yes (open source)

Supermemory

Yes (generous)

Usage-based

Enterprise only

Zep

Yes

Contact

Yes (community edition)

Letta

Yes

$20/mo

Yes (open source)

Honcho

Yes

Contact

Yes (open source)


How to choose

If retrieval accuracy at production scale is the primary requirement: Exabase. The benchmark gap between M-1 and the next best system is meaningful at scale, and it was achieved with a cheaper model, which changes the cost structure at volume.

If open source and self-hosting are hard requirements: Mem0 for the largest community and broadest integrations, Zep if temporal reasoning is important, Letta if you want LLM-managed memory within LangGraph, Honcho if implicit preference learning is the priority.

If you are early stage and want the fastest path to working memory: Supermemory for MCP-native coding agent workflows, Mem0 for the largest body of examples and community knowledge.

If you need memory plus document storage and filesystems: Exabase is currently the only platform that offers all three in a unified data layer.

If you are currently using a vector database for memory: read why a vector database is not a memory system first. The engineering cost of building the missing layers yourself is almost always higher than teams expect.

If you are currently using framework memory: read when to outgrow your framework's built-in memory. The session boundary is the wall most teams hit first.


Further reading

Comparisons


Concepts


Problems


Research

Ship your first app in minutes.

Ship your first app in minutes.

Ship your first app in minutes.