HR and recruiting agents

Extract structured data from CVs, remember candidate interactions across every stage, and search the full pipeline, with per-role Bases keeping candidate pools isolated.


Recruiting runs on context that's scattered by design. A candidate's CV is one document, their interview feedback is several more, the scorecards live somewhere else, and the most useful detail is often a remark someone made in a debrief that never got written down properly. A recruiting agent meant to help has to pull all of that together, and most of them can't, because they have nowhere to extract it to, remember it in, or search it from.

The model can reason about a candidate perfectly well when the information is in front of it. The work is getting structured data out of the CVs, remembering a candidate across the stages of a process, searching the whole pipeline when you need to, and keeping each role's candidates cleanly separate. This page is about that infrastructure.

A scope note: this page is about the data infrastructure beneath a recruiting agent, not about hiring decisions, fairness, or the legal obligations that attach to recruitment in your jurisdiction. Those are matters for your own review.


The problem

The first difficulty is the CVs themselves. They arrive as PDFs and documents in every format imaginable, and before an agent can do anything useful with a candidate, the relevant information, experience, skills, history, has to come out in a structured form. Doing that reliably across thousands of differently formatted resumes is its own engineering problem.

The second is continuity across a process. A candidate moves through screening, interviews, and debriefs over weeks, and the context accumulates at each stage. An agent that forgets between stages makes everyone re-establish where the candidate stands, and loses the thread of what was said about them earlier. Trying to carry the whole history in the prompt hits the usual context window ceiling once a pipeline has any real volume. And candidates' situations change: an early concern gets resolved in a later round, a salary expectation shifts, a preference is updated. Keeping the current picture straight rather than stacking contradictory notes is an entity resolution problem, and an agent that surfaces a concern already addressed two rounds ago is suffering memory drift.

The third is isolation. A company recruits for many roles at once, and the candidate pool for one role generally shouldn't bleed into another. Keeping each role's candidates cleanly separated needs to be structural rather than a convention.


What Exabase unlocks

With extraction, candidate memory, and per-role isolation underneath, the agent can actually hold a pipeline together.

Drop in a CV and the agent gets structured data out of it, experience, skills, history, regardless of how the document was formatted, ready to work with rather than sitting as an unparsed file. A stack of resumes becomes a searchable, structured pool instead of a folder of PDFs.

The agent remembers each candidate across the whole process. It knows what was raised in screening, what each interviewer thought, what's still open, what the candidate is looking for. A debrief opens with the agent already holding the candidate's history rather than the panel reconstructing it. When something about the candidate changes, the agent reflects the update rather than carrying the stale version forward.

And the full pipeline becomes searchable. "Which candidates for this role have led a team through a migration?" returns actual matches across the pool, found by meaning rather than exact keywords, instead of someone reading every CV. Each role's pool stays its own, so the search is scoped to the candidates that belong to it.


How it works

Three primitives carry a recruiting agent: Extract for the documents, Memory for candidate continuity, and Bases for per-role isolation.

Extract

Extract turns CVs and other recruiting documents into clean, structured text through one API, whatever format they arrive in. That's the step that makes a resume usable, since nothing can be searched or remembered until the information has been read out of the document cleanly. It handles the document formats recruiting throws at it and returns text chunked and ready to store, search, or feed into memory. The general pattern is covered in document extraction at scale if extraction volume is the core of your need.

Memory

Exabase Memory holds the evolving picture of each candidate: what came up in screening, interviewer feedback, what the candidate is looking for, where they stand. You send in the interactions and Exabase extracts what's worth keeping across stages. The contradiction handling matters here because a candidate's picture changes through a process: when a concern is resolved or an expectation shifts, the new state supersedes the old, so the agent works from where the candidate actually is rather than an early impression. This is what makes it a memory layer rather than a stack of stage-by-stage notes.

Bases

A Base is a fully isolated environment with its own memory, storage, and search. Create one per role, and each role's candidate pool lives in a space that contains only those candidates, with their memory and documents sealed off from every other role. The isolation is structural rather than a filter you apply, so one role's pool can't surface in another's search, and you get it from a single API call rather than building partitioning yourself. The same pattern underpins any multi-tenant SaaS product.


Example architecture

The structure is straightforward.

One Base per role, so each role's candidates, their documents, memory, and search, stay isolated from other roles.

As candidates enter, run their CVs through Extract to get structured text and store it as Resources in the role's Base, where it's indexed for search.

As the process unfolds, send screening notes, interview feedback, and debriefs to Memory within the role's Base, where Exabase extracts and keeps each candidate's evolving picture current.

When you need to, the agent runs a Deep Search across the role's pool to find candidates matching a requirement, and retrieves memory for a given candidate's full history before a debrief or decision point.

CVs flow in through extraction into a per-role Base, candidate context accumulates in memory across stages, and the agent searches the pool or recalls a candidate as needed. Each role's pool stays sealed in its own Base.


What compounds over time

A recruiting agent built this way gets more useful as a pipeline runs and as the organisation hires over time.

Within a role, every stage adds to what the agent knows about each candidate, and the picture stays current as things change because the memory self-organises rather than accumulating contradictions. A candidate late in a process is one the agent understands in full context, not one whose history has to be pieced back together from scattered notes.

Across the organisation, the structured, searchable record of candidates and processes builds up. Past candidates remain findable for future roles, and the cost of getting a CV into usable form is paid once. The do-it-yourself alternative, a resume parser you maintain, a store you wire up, an isolation scheme you hope holds, gets harder to run as volume grows and still leaves you building the cross-stage memory yourself. Infrastructure that improves with use is the better bet when the pipeline only gets busier.


Who's building this

Teams building recruiting copilots, applicant-tracking tools, sourcing agents, and HR assistants, anywhere an agent works across candidate documents and a multi-stage process and needs to keep each role's pool separate.

For the extraction side, document extraction at scale covers processing CVs in volume, and the multi-tenant SaaS use case covers the per-role isolation pattern in general terms.


Get started

Start with the getting started guide, then about extraction and submitting jobs for the CV side, and memory scoping to see how per-role isolation works. There's a free tier to build against. As with any recruiting deployment, questions of fairness and legal obligation are for your own review.


FAQs

Can it pull structured data from CVs in any format?

Yes. Extract reads CVs whatever format they arrive in and returns clean, structured text, so a stack of differently formatted resumes becomes a usable, searchable pool rather than a folder of unparsed files.


Does the agent remember a candidate across interview stages?

Yes. Screening notes, interview feedback, and debriefs go into Memory as the process unfolds, so the agent holds each candidate's full history and opens a debrief already aware of what came before.


How does it handle something about a candidate changing during the process?

The new state supersedes the old through entity resolution. A concern resolved in a later round, or a shifted expectation, updates the candidate's picture rather than coexisting with the earlier note, so the agent works from the current reality.


How are candidate pools for different roles kept separate?

Each role gets its own Base, which fully isolates its candidates, documents, and search from other roles. Operations scoped to a role's Base can only see that role's pool, so one role's candidates never appear in another's search.


Can I search the pipeline for candidates matching a requirement?

Yes. With CVs stored as Resources, Deep Search finds candidates by meaning, so a query like "candidates who have led a migration" returns matches even when the CV phrases it differently, scoped to the role's pool.


Does Exabase make hiring decisions or assess candidates?

No. Exabase is data infrastructure: extraction, memory, search, and isolation. How candidates are assessed and decided on, and whether a given use meets your fairness and legal obligations, is determined by you and your own review, not by Exabase.


Ship your first app in minutes.

Ship your first app in minutes.