About extraction
The Extraction API lets you submit files or URLs for automated processing and receive structured metadata back – text content, thumbnails, document properties, web page previews, media attributes, and more. Each submission becomes an * extraction job* that progresses through a pipeline and exposes its results once complete.
How it works
- Submit a file or a URL with
POST /v2/extract. - Exabase processes the item asynchronously. The
statefield on the job reflects the current progress. - Once the job reaches
completed, theextractionobject is populated with all available data. - You can poll for completion or configure a webhook to be notified automatically.
Job states
| State | Meaning |
|---|---|
pending |
Queued, not yet picked up for processing |
processing |
Actively being processed |
completed |
Processing finished, extraction data available |
failed |
Processing failed; the job can be reprocessed |
cleaned |
Job and its stored files have been deleted |
Extraction data
The extraction object on a completed job is split into four sections. Not all sections are present for every kind of
item.
| Section | Available for | Key fields |
|---|---|---|
common |
All kinds | mimeType, size, thumbnail, chunkCount |
media |
Images, audio, video | width, height, duration, ocr, transcript |
document |
PDFs and documents | pages, author, title, pdfRender |
web |
Bookmarks and web pages | title, siteName |
File retention
Storage files attached to extraction jobs are retained for 1 day from job creation. After that the job is marked
cleaned and its associated files are permanently deleted. You should download or copy any files you need to keep
before the retention window expires.