Use cases

Self-maintaining knowledge bases

Workers maintain your knowledge base autonomously, re-extracting updated documents, tagging new content, and pruning what's gone stale, so your team doesn't have to.

Knowledge bases rot. The day you build one it's accurate, and from then on it decays: documents go out of date, new material arrives and doesn't get added, old content lingers long after it stopped being true. Keeping it current is a standing chore that falls to someone, and because it's nobody's favourite job, it slips, and the knowledge base quietly becomes less trustworthy until people stop relying on it.

Exabase Workers attack the rot directly. A Worker maintains a knowledge base autonomously, re-extracting documents that have changed, processing new content as it arrives, and pruning what's gone stale, all on a schedule, without anyone tending it. This page is about a knowledge base that maintains itself, so staying current stops being a task your team has to remember to do.

The problem

The trouble with a knowledge base isn't building it, it's keeping it true. Information has a shelf life. A policy gets revised, a product changes, a process is updated, and every document that described the old version is now subtly wrong. Nobody set out to let it rot; it rots because keeping it current is continuous manual work and manual work competes with everything else.

The usual pattern is a knowledge base that's accurate at launch and decreasingly so thereafter. New material arrives and waits for someone to ingest it. Updated source documents sit un-reprocessed, so the knowledge base reflects last quarter's version. Outdated content stays in, because pruning is even less likely to get done than adding. Over time the proportion of stale and missing information grows, and the cost isn't just irrelevance, it's that an agent or a person querying the knowledge base gets confidently wrong answers drawn from material that should have been updated or removed. That's memory drift at the level of a whole corpus.

Doing the upkeep properly means a recurring pipeline: detect what's changed, re-extract it, add what's new, tag it so it's findable, remove what's dead. Building and operating that scheduling and processing machinery is real infrastructure work, and it's the kind that's easy to defer and easy to under-resource, which is exactly why so few knowledge bases get it.

What Exabase unlocks

With Workers doing the upkeep, the knowledge base stops being a thing that decays between manual interventions and becomes something that holds its currency on its own.

Updated documents get re-processed without anyone noticing they changed. A Worker checks the sources it's responsible for, re-extracts the ones that have been revised, and updates the corpus, so the knowledge base reflects the current version of a document rather than whatever it said when it was first ingested.

New content gets absorbed as it arrives. Rather than waiting in a queue for someone to add it, new material is processed, tagged, and made searchable on the Worker's schedule, so the knowledge base grows continuously instead of in occasional manual batches.

Stale content gets pruned. Material that's aged out or been superseded is identified and removed or flagged, so the corpus doesn't just accumulate forever, growing noisier with every addition. The result is a knowledge base that stays both complete and current, which is the state a manually maintained one is almost never in for long.

How it works

Four primitives combine, and this is the page where Workers leads: it's the maintenance engine, and Extract, Resources, and Deep Search are the substrate it keeps current.

Workers

Workers are autonomous, scheduled agents that do the upkeep. A Worker runs on a schedule you set, daily, weekly, whatever the corpus needs, and carries out maintenance without being triggered: re-extracting changed sources, processing new arrivals, tagging content so it stays findable, and pruning what's stale. You create a Worker, give it the job and the schedule, and it runs on its own, using tools to act on the knowledge base. This is the difference between a knowledge base that needs a person to stay current and one that doesn't, and it's the same engine behind research agents with evolving knowledge.

Extract

Extract is what the Worker uses to turn source documents, in any format, into clean text when they're added or changed. When a Worker finds a revised document, it re-extracts it through the same one API that handles the initial ingestion, so updates flow in the same way new content does. The general extraction case is covered in document extraction at scale.

Resources

Resources are the knowledge base itself, the stored, organised content. Workers act on Resources: adding new ones, updating changed ones, tagging them, removing dead ones. Because Resources are indexed for Deep Search as they're stored, maintenance and searchability are the same flow, so anything the Worker adds or updates is immediately findable.

Deep Search

Deep Search is how the maintained knowledge base actually gets used. Querying it returns relevant content by meaning at the paragraph level, and because the corpus is kept current by Workers, the results reflect the present state rather than whatever was true at the last manual update. Search quality also holds as the corpus grows, avoiding the semantic collapse that degrades naive search at scale, so a knowledge base that keeps growing stays usable.

Example architecture

The structure is a maintained store with an autonomous process keeping it current.

Build the base. Extract your initial documents through Extract and store them as Resources, indexed for Deep Search. This is the knowledge base at launch.

Set the Worker to maintain it. Create a Worker on a schedule whose job is upkeep: check sources for changes and re-extract them, process and tag new content, prune what's stale. It runs without being triggered.

Query as normal. Your agents and users run Deep Search against the knowledge base, getting results that reflect the current state because the Worker has kept it current.

Scope if needed. For per-customer or per-team knowledge bases, put each in its own Base following the multi-tenant SaaS pattern, and give each its own maintaining Worker.

The corpus lives as searchable Resources, a Worker keeps it current on a schedule, and queries always hit the up-to-date version. The maintenance pipeline you'd otherwise build and run by hand is the Worker.

What compounds over time

This is a use case defined by what happens over time, because rot is a time problem and self-maintenance is the answer to it.

A manually maintained knowledge base is at its best the day it's built and degrades from there, with the gap between what it says and what's true widening as upkeep slips. A self-maintaining one moves the other way: it stays current as sources change, grows as new material is absorbed, and stays clean as stale content is pruned, so its trustworthiness holds rather than eroding. Because the underlying memory and indexing self-organise and search quality holds at scale, the knowledge base can grow large without becoming unwieldy or unreliable.

The contrast with the do-it-yourself approach is really a contrast in where the effort goes. Hand-maintained upkeep is recurring human work that scales with the size and churn of the corpus and tends to lose the competition for attention. A Worker is set up once and then absorbs that growth on its own, so the corpus gets bigger and stays fresher while the maintenance burden stays flat. Infrastructure that maintains itself is the only version of a knowledge base that stays trustworthy without a standing commitment of someone's time.

Who's building this

Teams running any knowledge base that changes over time: internal wikis and documentation, support knowledge bases, product knowledge, research corpora, anywhere content goes stale and someone is currently responsible for keeping it current. If your knowledge base is static and never changes, you don't need this; the moment it has churn, self-maintenance is what keeps it from rotting.

The closest neighbour is research agents with evolving knowledge, which applies the same autonomous-maintenance engine to a research corpus, and document extraction at scale covers the ingestion side. For building an internal tool on top of a maintained knowledge base, see internal tools powered by AI. The topic researcher example is a concrete Worker-driven build to learn from.

Get started

Start with the getting started guide, then about Workers, creating Workers, and running Workers for the maintenance engine, plus worker tools for what a Worker can do. There's a free tier to build against.

FAQs

What does a Worker actually do to keep the knowledge base current?

It runs on a schedule and performs upkeep without being triggered: checks sources for changes and re-extracts the revised ones, processes and tags new content, and prunes material that's gone stale. You define the job and the schedule when you create the Worker.

How often does maintenance run?

On whatever schedule you set, daily, weekly, or otherwise, matched to how fast your corpus changes. A fast-moving knowledge base can be maintained frequently; a slower one less often.

Does it re-process documents that have changed, or only add new ones?

Both. A Worker processes new content as it arrives and re-extracts existing documents that have been updated, so the knowledge base reflects the current version of a source rather than the version it was first ingested at.

What happens to outdated content?

It's pruned or flagged rather than left to accumulate, so the corpus stays clean instead of growing noisier with every addition. This is what stops a knowledge base from drifting toward confidently wrong answers drawn from stale material.

Will search quality hold as the knowledge base grows?

Yes. Deep Search is built to hold retrieval quality at scale, where naive search tends to suffer semantic collapse, so a knowledge base that keeps growing stays usable.

Can I run separate knowledge bases for different teams or customers?

Yes. Put each in its own Base for full isolation, following the multi-tenant SaaS pattern, and give each its own maintaining Worker.