Retrieving extraction results

Getting a single job

Endpoint: GET /v2/extract/{jobId}

Returns the current state of an extraction job. If the job is still processing, call again until state reaches completed, or use a webhook to avoid polling entirely.

Pass ?format=markdown to receive a human-readable Markdown document instead of JSON (see Output formats).

import { Exabase } from "@exabase/sdk";

const api = new Exabase({
  apiKey: process.env.EXABASE_API_KEY,
});

const job = await api.extract.get({ jobId: "job-id" });

if (job.state === "completed") {
  console.log(job.extraction?.common?.mimeType);
  console.log(job.extraction?.common?.chunkCount);
  console.log(job.extraction?.document?.pages);
  console.log(job.extraction?.document?.documentType); // "invoice" | "resume" | "contract" | "other"
  console.log(job.extraction?.document?.structured);   // type-specific structured data
}

curl https://api.exabase.io/v2/extract/<jobId> \
  -H 'X-Api-Key: <EXABASE_API_KEY>'

curl 'https://api.exabase.io/v2/extract/<jobId>?format=markdown' \
  -H 'X-Api-Key: <EXABASE_API_KEY>'

Example response for a completed invoice PDF (format=json, default):

{
  "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "workspaceId": "...",
  "userId": "...",
  "kind": "document",
  "name": "Invoice #1042",
  "url": null,
  "state": "completed",
  "createdAt": "2025-06-01T10:00:00.000Z",
  "extraction": {
    "common": {
      "mimeType": "application/pdf",
      "size": 204800,
      "thumbnail": "https://cdn.exabase.io/...",
      "chunkCount": 8
    },
    "document": {
      "pages": 2,
      "author": "Acme Corp",
      "title": "Invoice #1042",
      "creationDate": "2025-05-28T00:00:00.000Z",
      "pdfRender": { "url": "https://cdn.exabase.io/..." },
      "documentType": "invoice",
      "structured": {
        "vendor": "Acme Corp",
        "invoiceNumber": "1042",
        "issueDate": "2025-05-28",
        "dueDate": "2025-06-28",
        "lineItems": [
          { "description": "Widget A", "quantity": 10, "unitPrice": 5.00, "amount": 50.00 },
          { "description": "Widget B", "quantity": 2, "unitPrice": 25.00, "amount": 50.00 }
        ],
        "total": 100.00
      }
    }
  },
  "links": {
    "chunks": "https://api.exabase.io/v2/extract/<jobId>/chunks?start=1&end=20",
    "download": "https://api.exabase.io/v2/extract/<jobId>/download"
  }
}

Same job with ?format=markdown:

# Invoice #1042

**Kind:** document  
**Created:** 2025-06-01T10:00:00.000Z  
**MIME Type:** application/pdf  
**Size:** 204800 bytes  

## Document

**Author:** Acme Corp  
**Pages:** 2  
**Created:** 2025-05-28T00:00:00.000Z  

## Invoice

**Vendor:** Acme Corp  
**Invoice number:** 1042  
**Issue date:** 2025-05-28  
**Due date:** 2025-06-28  
**Line items:**
  - **Description:** Widget A  
    **Quantity:** 10  
    **Unit price:** 5  
    **Amount:** 50  
  - **Description:** Widget B  
    **Quantity:** 2  
    **Unit price:** 25  
    **Amount:** 50  

**Total:** 100

Listing jobs

Endpoint: GET /v2/extract

Returns jobs in reverse-chronological order. Use nextCursor to page through results. You can filter by state or kind.

let cursor: string | null = null;

do {
  const page = await api.extract.list({
    limit: 50,
    ...(cursor && { cursor }),
    state: "completed",
  });

  for (const job of page.items) {
    console.log(job.id, job.name, job.state);
  }

  cursor = page.nextCursor;
} while (cursor);

curl 'https://api.exabase.io/v2/extract?limit=50&state=completed' \
  -H 'X-Api-Key: <EXABASE_API_KEY>'

Reading text chunks

When extraction.common.chunkCount is set, the text content has been split into searchable chunks that you can retrieve in pages.

Endpoint: GET /v2/extract/{jobId}/chunks

const result = await api.extract.getChunks({
  jobId: "job-id",
  start: 1,
  end: 20,
});

for (const chunk of result.items) {
  console.log(chunk.sequence, chunk.text);
  // chunk.pageNumber — set for document chunks
  // chunk.timeStart / chunk.timeEnd — set for audio/video chunks
}

curl 'https://api.exabase.io/v2/extract/<jobId>/chunks?start=1&end=20' \
  -H 'X-Api-Key: <EXABASE_API_KEY>'

Downloading attachments

Download all files associated with a job (original file, thumbnail, screenshot, transcript, PDF render, etc.) as a single ZIP archive. Available files depend on the extraction type (see the extraction data table on the About page).

Endpoint: GET /v2/extract/{jobId}/download

import { createWriteStream } from "fs";
import { Readable } from "stream";

const response = await fetch(
  `https://api.exabase.io/v2/extract/${jobId}/download`,
  { headers: { "X-Api-Key": process.env.EXABASE_API_KEY! } },
);

Readable.fromWeb(response.body!).pipe(createWriteStream("attachments.zip"));

curl https://api.exabase.io/v2/extract/<jobId>/download \
  -H 'X-Api-Key: <EXABASE_API_KEY>' \
  --output attachments.zip

Submitting extraction jobs

Extraction webhooks