Give an LLM a persistent memory. This tutorial walks through how the Personal Assistant example wires Exabase's Memory API into a chat interface so the model can search and store memories mid-conversation using AI SDK tool calling.
What it does
A chat-based assistant that can remember things across sessions. When you tell it something worth remembering, it stores a memory. When you ask a question that requires prior context, it searches its memory before answering. The UI also lets you browse, delete, and seed sample memories.
This is the most minimal "memory-augmented chatbot" pattern — the LLM decides when to search and when to store, and Exabase handles the persistence and retrieval.
How Exabase fits in
The entire integration is two AI SDK tools: searchMemory and addMemory.
searchMemory
The model calls this when it needs context. It performs a semantic search against all memories in the current Base:
searchMemory:tool({description:'Search saved memories for this workspace. Phrase the query as a full question.',inputSchema:z.object({query:z.string().min(1),limit:z.number().min(1).max(20).optional(),}),execute:async({query,limit})=>{constapi = getExabase();constres = awaitapi.memories.search({query,limit:limit ?? 8},{baseId},);return{ok:true,total:res.total,hits:res.hits.map((h)=>({id:h.id,name:h.name,content:h.content,score:h.score ?? null,})),};},}),
searchMemory:tool({description:'Search saved memories for this workspace. Phrase the query as a full question.',inputSchema:z.object({query:z.string().min(1),limit:z.number().min(1).max(20).optional(),}),execute:async({query,limit})=>{constapi = getExabase();constres = awaitapi.memories.search({query,limit:limit ?? 8},{baseId},);return{ok:true,total:res.total,hits:res.hits.map((h)=>({id:h.id,name:h.name,content:h.content,score:h.score ?? null,})),};},}),
searchMemory:tool({description:'Search saved memories for this workspace. Phrase the query as a full question.',inputSchema:z.object({query:z.string().min(1),limit:z.number().min(1).max(20).optional(),}),execute:async({query,limit})=>{constapi = getExabase();constres = awaitapi.memories.search({query,limit:limit ?? 8},{baseId},);return{ok:true,total:res.total,hits:res.hits.map((h)=>({id:h.id,name:h.name,content:h.content,score:h.score ?? null,})),};},}),
The tool description is doing real work here — it tells the model when to search ("use before answering when recall might matter") and how to query ("phrase as a full question, not bare keywords"). This shapes how often and how well the model uses the tool.
addMemory
The model calls this when the user says something worth retaining:
addMemory:tool({description:'Save a verbatim memory. Use when the user asks you to remember something.',inputSchema:z.object({content:z.string().min(1),}),execute:async({content})=>{constapi = getExabase();awaitapi.memories.create({source:"text",content:content.trim(),infer:false},{baseId},);return{ok:true};},}),
addMemory:tool({description:'Save a verbatim memory. Use when the user asks you to remember something.',inputSchema:z.object({content:z.string().min(1),}),execute:async({content})=>{constapi = getExabase();awaitapi.memories.create({source:"text",content:content.trim(),infer:false},{baseId},);return{ok:true};},}),
addMemory:tool({description:'Save a verbatim memory. Use when the user asks you to remember something.',inputSchema:z.object({content:z.string().min(1),}),execute:async({content})=>{constapi = getExabase();awaitapi.memories.create({source:"text",content:content.trim(),infer:false},{baseId},);return{ok:true};},}),
That's it. Two tools, and the LLM handles the orchestration — deciding when to search, when to store, and how to use retrieved memories in its response.
The system prompt
The system prompt is what makes the model actually use the tools well. Here's the full prompt from the example:
You are a helpful personal assistant. Beclear,concise,practical,and friendly.
Usethe memoriesinthisworkspaceas the primary source of personal context(family details,work info,preferences,plans,and capabilities). Ifa questionmight depend on prior context,call searchMemory first,and set query to a clearquestioninnatural language(as ifyouwere asking the index what to retrieve),not a list of search terms.
Whenthe user asks you to remember something,or when a durable personal detailshould be retained forfuture chats,call addMemory withthe exact detailinplain language.
Ifmemory results are missing or ambiguous,say so briefly and ask a clarifyingquestion rather than guessing
You are a helpful personal assistant. Beclear,concise,practical,and friendly.
Usethe memoriesinthisworkspaceas the primary source of personal context(family details,work info,preferences,plans,and capabilities). Ifa questionmight depend on prior context,call searchMemory first,and set query to a clearquestioninnatural language(as ifyouwere asking the index what to retrieve),not a list of search terms.
Whenthe user asks you to remember something,or when a durable personal detailshould be retained forfuture chats,call addMemory withthe exact detailinplain language.
Ifmemory results are missing or ambiguous,say so briefly and ask a clarifyingquestion rather than guessing
You are a helpful personal assistant. Beclear,concise,practical,and friendly.
Usethe memoriesinthisworkspaceas the primary source of personal context(family details,work info,preferences,plans,and capabilities). Ifa questionmight depend on prior context,call searchMemory first,and set query to a clearquestioninnatural language(as ifyouwere asking the index what to retrieve),not a list of search terms.
Whenthe user asks you to remember something,or when a durable personal detailshould be retained forfuture chats,call addMemory withthe exact detailinplain language.
Ifmemory results are missing or ambiguous,say so briefly and ask a clarifyingquestion rather than guessing
Two things worth noting: the prompt explicitly tells the model to use natural-language questions for search queries (not keyword lists), and it tells the model to admit gaps rather than hallucinate. Both of these matter a lot for retrieval quality.
Wiring it together
The API route connects the system prompt, tools, and streaming in a few lines:
stopWhen: stepCountIs(12) caps the number of tool-calling rounds — the model can search and store multiple times per turn, but won't loop forever. The response streams back to the client as a UIMessageStream, which the frontend renders with useChat from the AI SDK.
Seeding sample memories
For demos, the app can populate a Base with sample personal memories:
This lets someone try the chat immediately without entering their own data.
Key Exabase APIs used in this example:
API
Purpose
bases.create
Create an isolated workspace
memories.search
Semantic search over stored memories
memories.create
Store new memories from chat
memories.list
Browse memories in the UI sidebar
memories.delete
Remove individual memories
Run it yourself
git clone https://github.com/futurebrowser/exabase-examples.git
cd
git clone https://github.com/futurebrowser/exabase-examples.git
cd
git clone https://github.com/futurebrowser/exabase-examples.git
cd
Add EXABASE_API_KEY and OPENAI_API_KEY to .env.local, open http://localhost:3000, and click New base.
FAQ
How does the model decide when to search vs. store?
It's all in the tool descriptions. The searchMemory description says "use before answering when recall might matter," and addMemory says "use when the user asks you to remember something." The model follows these instructions through standard tool-calling behavior. Tuning these descriptions is the main lever you have.
Why infer: false on every memory?
In this example, the user (or model) provides the exact text to store. There's no need for Exabase to run additional inference. If you wanted Exabase to auto-extract entities or generate richer metadata, you'd set infer: true.
How is this different from ChatGPT's memory?
The memory here is scoped to an Exabase Base, not to a user account. You control the storage, can inspect every memory, and can build any UI on top of it. It's infrastructure, not a product feature — you own the data and the retrieval logic.
What does stopWhen: stepCountIs(12) do?
It caps the number of tool-calling rounds per turn. The model might search, read the results, search again with a refined query, then store something — that's 3 steps. The cap of 12 gives the model room to do multi-step reasoning without risking an infinite loop if something goes wrong.
What do the sample memories look like?
They're plain-text strings like "User's name is Alex Rivera. Married to Sam Rivera. They have two kids: Mia (9) and Leo (6)." and "User prefers meetings after 10:00 AM, blocks 3:00-5:00 PM for focused work." — short, factual, and written the way you'd want a search hit to read. This is a good template for designing your own memory content.
Can I use this with a framework other than Next.js?
The Exabase integration is just the @exabase/sdk package plus two tool definitions. There's nothing Next.js-specific about the memory layer. If you're using Express, Fastify, or any other backend, you can copy the tool functions and wire them into whatever AI SDK setup you prefer.