Last updated May 2026
Supermemory is a relatively new memory API for AI agents. It offers a free tier, MCP integrations for Claude Code, Cursor, and Windsurf, and claims benchmark leadership on LongMemEval, LoCoMo, and ConvoMem. Here is what the numbers actually show.
Comparison Table
Feature | Exabase | Supermemory |
|---|---|---|
LongMemEval score | 96.4% (Gemini 3 Flash) | 85.2% (Gemini 3 Pro) |
Memory API | Yes | |
Filesystem / document storage | No | |
Extraction | Yes | |
Bases (cloud filesystems) | No | |
Graph memory | Yes | Yes |
Contradiction resolution | Yes | Yes |
Temporal reasoning | Yes | Partial |
Sub-300ms retrieval | Yes | Yes (claimed) |
Open source | No | Partial |
Self-hostable | No | Enterprise only |
Pricing | Free tier, usage-based |
What is Supermemory?
Supermemory is a managed memory API that extracts facts from conversations, builds user profiles, handles contradictions, and retrieves context at query time. It supports RAG alongside memory in a single query, and offers connectors for Google Drive, Gmail, Notion, and GitHub. MCP integrations make it straightforward to add to Claude Code, Cursor, and Windsurf. Self-hosting is available on enterprise plans only. Benchmark results are self-reported and have not been independently verified by a third party.
What is Exabase?
Exabase is a managed Memory API built for production AI agents. Its core engine, M-1, handles the full memory pipeline: chunking, enrichment, relationship tracking, contradiction resolution, temporal weighting, and hybrid retrieval. Rather than storing raw conversation logs and running similarity search over them, Exabase builds a dynamic knowledge graph that evolves as new information comes in. Contradictions get resolved. Stale facts get updated. The context returned to your agent is accurate, not just adjacent.
Alongside memory, Exabase offers search, extraction, and Bases (cloud filesystems) as part of a unified data layer for agents. Graph memory is included across all plans.
Key Differences
The benchmark gap
On LongMemEval, M-1 scores 96.4%. Supermemory scores 85.2%. That is an 11-point gap. It is the largest in any head-to-head comparison we have run.
What makes it more significant is the model each system used. Supermemory used Gemini 3 Pro. Exabase used Gemini 3 Flash, a substantially smaller and cheaper model. A larger model can compensate for weaker retrieval by extracting correct answers from noisy or partially relevant context. Supermemory had that advantage and still trailed by 11 points.
A gap of this size, with a more expensive model on the losing side, is not a tuning problem. It is not something that prompt optimisation or a better embedding model closes. It points to fundamental differences in retrieval architecture: how memories are chunked, indexed, ranked, and assembled before they reach the model. Read the full methodology.
Benchmark claims
Supermemory claims to be number one on LongMemEval, LoCoMo, and ConvoMem on their homepage and GitHub. These results are self-reported. No independent third-party verification has been published. Third-party sources that have run their own evaluations return materially lower scores than Supermemory's published figures. Our results are published in full, methodology included, with the prompt and results JSON available for anyone to reproduce at exabase.io/research.
Cost at scale
M-1 achieves a higher score with Gemini 3 Flash than Supermemory achieves with Gemini 3 Pro. In production, the cost difference between these models is substantial and compounds across millions of queries. Better retrieval with a cheaper model is not a marginal improvement. It is a different cost structure entirely.
Temporal reasoning
Supermemory handles some temporal updates but does not fully resolve contradictions across time at the retrieval layer. When a fact changes, earlier versions can persist in the store and surface alongside newer ones. Exabase tracks temporal relationships explicitly, resolving contradictions before context reaches the model rather than leaving resolution to downstream inference.
Feature scope
Both products offer memory, extraction, and graph memory. Supermemory adds connectors for Google Drive, Gmail, Notion, and GitHub, which Exabase does not currently match on breadth. Exabase adds Bases, isolated cloud filesystems for agents, which Supermemory does not offer. Self-hosting on Supermemory requires an enterprise agreement. Exabase is a managed API with no self-hosted path.
When to Use Each
Use Exabase if retrieval accuracy is the primary requirement. The benchmark gap is large enough that at production scale it translates directly into fewer hallucinations, more reliable context, and lower model costs. You want a unified data layer that includes memory and Bases without assembling multiple tools. You are comfortable with a managed API.
Use Supermemory if you need MCP-native integrations for coding agents and want to get started quickly on a generous free tier. You need connectors for Google Drive, Gmail, or Notion out of the box. You are at an early stage where benchmark accuracy differences are less material than speed of integration.
Why Developers Move from Supermemory to Exabase
Retrieval accuracy became a production problem. The gap between 85% and 96% is invisible in a demo. In production, across thousands of queries, it surfaces as wrong context, missed facts, and hallucinations that are hard to attribute. Teams running Supermemory at scale encounter a ceiling that cannot be tuned away.
Self-reported benchmarks did not hold up. Supermemory's homepage claims benchmark leadership. Independent evaluations return lower numbers. Teams that ran their own evaluations found the gap between claimed and actual performance significant enough to look elsewhere.
The model cost did not justify the accuracy. Getting the best results from Supermemory requires a Pro-tier model. Exabase achieves better results with Flash. At scale that difference compounds in both directions: lower model spend and higher retrieval quality.
They needed Bases. Document storage and isolated tenant containers for agent workloads are not something Supermemory covers. Teams that outgrew memory-only infrastructure needed a second integration or a different provider.
FAQs
How does the 11-point benchmark gap translate in practice?
On LongMemEval's 500 questions, an 11-point gap means roughly 55 additional correct retrievals out of 500. In production, where agents handle thousands of queries, the gap compounds. Wrong context produces wrong answers. Wrong answers at scale produce support tickets, user churn, and hallucinations that are difficult to trace back to the retrieval layer.
Are Supermemory's benchmark claims accurate?
Their results are self-reported and have not been independently verified. Third-party evaluations have returned materially lower scores. Our methodology, prompt, and results are published in full at exabase.io/research and are reproducible by anyone.
What is LongMemEval?
A public benchmark designed to evaluate long-term memory in conversational AI systems. It presents approximately 115,000 tokens of conversational history across multiple sessions and tests six distinct memory capabilities: single-session recall, preference tracking, assistant-provided information, multi-session reasoning, temporal reasoning, and knowledge update. It is the most widely used public benchmark for this class of problem.
Does Exabase have connectors for Google Drive, Gmail, and Notion?
Not currently. If connector breadth is a primary requirement, that is worth factoring into your evaluation.
Is Supermemory open source?
Partially. The MCP plugins and some tooling are open source. The core memory engine is not. Self-hosting requires an enterprise agreement.
Does Exabase work with my existing stack?
Exabase is model-agnostic and framework-agnostic. It works via REST API, Python and JavaScript SDKs, and MCP support for Claude, Cursor, Windsurf, and other compatible tools. See the docs for setup guides and examples.
Can I migrate from Supermemory to Exabase?
Yes. Exabase provides an API that accepts memories in standard formats. See the docs or contact us to walk through the migration path for your specific setup.
Compare similar apps and tools:


