Examples

Build a Memory-Powered Personal Assistant with Exabase

Give an LLM a persistent memory. This tutorial walks through how the Personal Assistant example wires Exabase's Memory API into a chat interface so the model can search and store memories mid-conversation using AI SDK tool calling.



What it does

A chat-based assistant that can remember things across sessions. When you tell it something worth remembering, it stores a memory. When you ask a question that requires prior context, it searches its memory before answering. The UI also lets you browse, delete, and seed sample memories.

This is the most minimal "memory-augmented chatbot" pattern — the LLM decides when to search and when to store, and Exabase handles the persistence and retrieval.



How Exabase fits in

The entire integration is two AI SDK tools: searchMemory and addMemory.

searchMemory

The model calls this when it needs context. It performs a semantic search against all memories in the current Base:

searchMemory: tool({
  description:
    'Search saved memories for this workspace. Phrase the query as a full question.',
  inputSchema: z.object({
    query: z.string().min(1),
    limit: z.number().min(1).max(20).optional(),
  }),
  execute: async ({ query, limit }) => {
    const api = getExabase();
    const res = await api.memories.search(
      { query, limit: limit ?? 8 },
      { baseId },
    );

    return {
      ok: true,
      total: res.total,
      hits: res.hits.map((h) => ({
        id: h.id,
        name: h.name,
        content: h.content,
        score: h.score ?? null,
      })),
    };
  },
}),
searchMemory: tool({
  description:
    'Search saved memories for this workspace. Phrase the query as a full question.',
  inputSchema: z.object({
    query: z.string().min(1),
    limit: z.number().min(1).max(20).optional(),
  }),
  execute: async ({ query, limit }) => {
    const api = getExabase();
    const res = await api.memories.search(
      { query, limit: limit ?? 8 },
      { baseId },
    );

    return {
      ok: true,
      total: res.total,
      hits: res.hits.map((h) => ({
        id: h.id,
        name: h.name,
        content: h.content,
        score: h.score ?? null,
      })),
    };
  },
}),
searchMemory: tool({
  description:
    'Search saved memories for this workspace. Phrase the query as a full question.',
  inputSchema: z.object({
    query: z.string().min(1),
    limit: z.number().min(1).max(20).optional(),
  }),
  execute: async ({ query, limit }) => {
    const api = getExabase();
    const res = await api.memories.search(
      { query, limit: limit ?? 8 },
      { baseId },
    );

    return {
      ok: true,
      total: res.total,
      hits: res.hits.map((h) => ({
        id: h.id,
        name: h.name,
        content: h.content,
        score: h.score ?? null,
      })),
    };
  },
}),

The tool description is doing real work here — it tells the model when to search ("use before answering when recall might matter") and how to query ("phrase as a full question, not bare keywords"). This shapes how often and how well the model uses the tool.

addMemory

The model calls this when the user says something worth retaining:

addMemory: tool({
  description:
    'Save a verbatim memory. Use when the user asks you to remember something.',
  inputSchema: z.object({
    content: z.string().min(1),
  }),
  execute: async ({ content }) => {
    const api = getExabase();
    await api.memories.create(
      { source: "text", content: content.trim(), infer: false },
      { baseId },
    );
    return { ok: true };
  },
}),
addMemory: tool({
  description:
    'Save a verbatim memory. Use when the user asks you to remember something.',
  inputSchema: z.object({
    content: z.string().min(1),
  }),
  execute: async ({ content }) => {
    const api = getExabase();
    await api.memories.create(
      { source: "text", content: content.trim(), infer: false },
      { baseId },
    );
    return { ok: true };
  },
}),
addMemory: tool({
  description:
    'Save a verbatim memory. Use when the user asks you to remember something.',
  inputSchema: z.object({
    content: z.string().min(1),
  }),
  execute: async ({ content }) => {
    const api = getExabase();
    await api.memories.create(
      { source: "text", content: content.trim(), infer: false },
      { baseId },
    );
    return { ok: true };
  },
}),

That's it. Two tools, and the LLM handles the orchestration — deciding when to search, when to store, and how to use retrieved memories in its response.

The system prompt

The system prompt is what makes the model actually use the tools well. Here's the full prompt from the example:

You are a helpful personal assistant. Be clear, concise, practical, and friendly.

Use the memories in this workspace as the primary source of personal context
(family details, work info, preferences, plans, and capabilities). If a question
might depend on prior context, call searchMemory first, and set query to a clear
question in natural language (as if you were asking the index what to retrieve),
not a list of search terms.

When the user asks you to remember something, or when a durable personal detail
should be retained for future chats, call addMemory with the exact detail in
plain language.

If memory results are missing or ambiguous, say so briefly and ask a clarifying
question rather than guessing

You are a helpful personal assistant. Be clear, concise, practical, and friendly.

Use the memories in this workspace as the primary source of personal context
(family details, work info, preferences, plans, and capabilities). If a question
might depend on prior context, call searchMemory first, and set query to a clear
question in natural language (as if you were asking the index what to retrieve),
not a list of search terms.

When the user asks you to remember something, or when a durable personal detail
should be retained for future chats, call addMemory with the exact detail in
plain language.

If memory results are missing or ambiguous, say so briefly and ask a clarifying
question rather than guessing

You are a helpful personal assistant. Be clear, concise, practical, and friendly.

Use the memories in this workspace as the primary source of personal context
(family details, work info, preferences, plans, and capabilities). If a question
might depend on prior context, call searchMemory first, and set query to a clear
question in natural language (as if you were asking the index what to retrieve),
not a list of search terms.

When the user asks you to remember something, or when a durable personal detail
should be retained for future chats, call addMemory with the exact detail in
plain language.

If memory results are missing or ambiguous, say so briefly and ask a clarifying
question rather than guessing

Two things worth noting: the prompt explicitly tells the model to use natural-language questions for search queries (not keyword lists), and it tells the model to admit gaps rather than hallucinate. Both of these matter a lot for retrieval quality.

Wiring it together

The API route connects the system prompt, tools, and streaming in a few lines:

const tools = buildSupportMemoryTools(baseId);

const result = streamText({
  model: openai(resolveOpenAiModel()),
  system: personalAssistantSystemPrompt(baseId),
  messages: await convertToModelMessages(messages),
  tools,
  stopWhen: stepCountIs(12),
});

return result.toUIMessageStreamResponse();
const tools = buildSupportMemoryTools(baseId);

const result = streamText({
  model: openai(resolveOpenAiModel()),
  system: personalAssistantSystemPrompt(baseId),
  messages: await convertToModelMessages(messages),
  tools,
  stopWhen: stepCountIs(12),
});

return result.toUIMessageStreamResponse();
const tools = buildSupportMemoryTools(baseId);

const result = streamText({
  model: openai(resolveOpenAiModel()),
  system: personalAssistantSystemPrompt(baseId),
  messages: await convertToModelMessages(messages),
  tools,
  stopWhen: stepCountIs(12),
});

return result.toUIMessageStreamResponse();

stopWhen: stepCountIs(12) caps the number of tool-calling rounds — the model can search and store multiple times per turn, but won't loop forever. The response streams back to the client as a UIMessageStream, which the frontend renders with useChat from the AI SDK.

Seeding sample memories

For demos, the app can populate a Base with sample personal memories:

for (const content of SAMPLE_PERSONAL_MEMORIES) {
  await api.memories.create(
    { source: "text", content, infer: false },
    { baseId },
  );
}
for (const content of SAMPLE_PERSONAL_MEMORIES) {
  await api.memories.create(
    { source: "text", content, infer: false },
    { baseId },
  );
}
for (const content of SAMPLE_PERSONAL_MEMORIES) {
  await api.memories.create(
    { source: "text", content, infer: false },
    { baseId },
  );
}

This lets someone try the chat immediately without entering their own data.



Key Exabase APIs used in this example:

API

Purpose

bases.create

Create an isolated workspace

memories.search

Semantic search over stored memories

memories.create

Store new memories from chat

memories.list

Browse memories in the UI sidebar

memories.delete

Remove individual memories



Run it yourself

git clone https://github.com/futurebrowser/exabase-examples.git
cd

git clone https://github.com/futurebrowser/exabase-examples.git
cd

git clone https://github.com/futurebrowser/exabase-examples.git
cd

Add EXABASE_API_KEY and OPENAI_API_KEY to .env.local, open http://localhost:3000, and click New base.



FAQ

How does the model decide when to search vs. store?

It's all in the tool descriptions. The searchMemory description says "use before answering when recall might matter," and addMemory says "use when the user asks you to remember something." The model follows these instructions through standard tool-calling behavior. Tuning these descriptions is the main lever you have.


Why infer: false on every memory?

In this example, the user (or model) provides the exact text to store. There's no need for Exabase to run additional inference. If you wanted Exabase to auto-extract entities or generate richer metadata, you'd set infer: true.


How is this different from ChatGPT's memory?

The memory here is scoped to an Exabase Base, not to a user account. You control the storage, can inspect every memory, and can build any UI on top of it. It's infrastructure, not a product feature — you own the data and the retrieval logic.


What does stopWhen: stepCountIs(12) do?

It caps the number of tool-calling rounds per turn. The model might search, read the results, search again with a refined query, then store something — that's 3 steps. The cap of 12 gives the model room to do multi-step reasoning without risking an infinite loop if something goes wrong.


What do the sample memories look like?

They're plain-text strings like "User's name is Alex Rivera. Married to Sam Rivera. They have two kids: Mia (9) and Leo (6)." and "User prefers meetings after 10:00 AM, blocks 3:00-5:00 PM for focused work." — short, factual, and written the way you'd want a search hit to read. This is a good template for designing your own memory content.


Can I use this with a framework other than Next.js?

The Exabase integration is just the @exabase/sdk package plus two tool definitions. There's nothing Next.js-specific about the memory layer. If you're using Express, Fastify, or any other backend, you can copy the tool functions and wire them into whatever AI SDK setup you prefer.

Ship your first app in minutes.

Ship your first app in minutes.

Ship your first app in minutes.