About extraction

The Extraction API lets you submit files or URLs for automated processing and receive structured metadata back – text content, thumbnails, document properties, web page previews, media attributes, and more. Each submission becomes an * extraction job* that progresses through a pipeline and exposes its results once complete.

How it works

  1. Submit a file or a URL with POST /v2/extract.
  2. Exabase processes the item asynchronously. The state field on the job reflects the current progress.
  3. Once the job reaches completed, the extraction object is populated with all available data.
  4. You can poll for completion or configure a webhook to be notified automatically.

Job states

State Meaning
pending Queued, not yet picked up for processing
processing Actively being processed
completed Processing finished, extraction data available
failed Processing failed; the job can be reprocessed
cleaned Job and its stored files have been deleted

Extraction data

The extraction object on a completed job is split into four sections. Not all sections are present for every kind of item.

Section Available for Key fields
common All kinds mimeType, size, thumbnail, chunkCount
media Images, audio, video width, height, duration, ocr, transcript
document PDFs and documents pages, author, title, pdfRender
web Bookmarks and web pages title, siteName

File retention

Storage files attached to extraction jobs are retained for 1 day from job creation. After that the job is marked cleaned and its associated files are permanently deleted. You should download or copy any files you need to keep before the retention window expires.