Quick start guide

Get started with DocDigitizer's AI-powered document extraction API in minutes. This guide walks you through your first API call to extract structured data from documents.

Prerequisites

Before you begin, ensure you have the following:

  • API Key – Your unique API key provided by DocDigitizer. If you don’t have one, contact our sales team.
  • Context ID – A UUID that identifies your processing context/configuration. This is provided during your account setup.
  • A PDF document – The document you want to process (invoices, contracts, ID cards, etc.)

Authentication

All API requests require authentication using an API key. Include your API key in the request header:

Header Value Description
x-api-key Your API Key Required. Your unique API key for authentication.

Important: Keep your API key secure. Never expose it in client-side code or public repositories.


Your First API Call

API Endpoint

POST https://apix.docdigitizer.com/sync

Request Format

The API accepts multipart/form-data requests with the following parameters:

Parameter Type Required Description
files File Yes The PDF document to process.
id UUID Yes A unique identifier for this document/request. Generate a new UUID for each request.
contextId UUID Yes Your context identifier that determines the processing pipeline and schema configuration.

Example: cURL

curl -X POST https://apix.docdigitizer.com/sync \
  -H "x-api-key: YOUR_API_KEY" \
  -F "files=@/path/to/your/document.pdf" \
  -F "id=550e8400-e29b-41d4-a716-446655440000" \
  -F "contextId=YOUR_CONTEXT_ID"

Example: PowerShell

$headers = @{
    "x-api-key" = "YOUR_API_KEY"
}

$form = @{
    files = Get-Item -Path "C:\path\to\your\document.pdf"
    id = "550e8400-e29b-41d4-a716-446655440000"
    contextId = "YOUR_CONTEXT_ID"
}

$response = Invoke-RestMethod -Uri "https://apix.docdigitizer.com/sync" `
    -Method Post `
    -Headers $headers `
    -Form $form

$response | ConvertTo-Json -Depth 10

Example: Python

import requests
import uuid

url = "https://apix.docdigitizer.com/sync"

headers = {
    "x-api-key": "YOUR_API_KEY"
}

# Generate a unique document ID
document_id = str(uuid.uuid4())

files = {
    "files": open("/path/to/your/document.pdf", "rb")
}

data = {
    "id": document_id,
    "contextId": "YOUR_CONTEXT_ID"
}

response = requests.post(url, headers=headers, files=files, data=data)
print(response.json())

Example: JavaScript (Node.js)

const FormData = require('form-data');
const fs = require('fs');
const fetch = require('node-fetch');
const { v4: uuidv4 } = require('uuid');

const form = new FormData();
form.append('files', fs.createReadStream('/path/to/your/document.pdf'));
form.append('id', uuidv4());
form.append('contextId', 'YOUR_CONTEXT_ID');

fetch('https://apix.docdigitizer.com/sync', {
    method: 'POST',
    headers: {
        'x-api-key': 'YOUR_API_KEY'
    },
    body: form
})
.then(response => response.json())
.then(data => console.log(data));

Example: C# (.NET)

using var client = new HttpClient();
using var form = new MultipartFormDataContent();

// Add API key header
client.DefaultRequestHeaders.Add("x-api-key", "YOUR_API_KEY");

// Add file
var fileContent = new ByteArrayContent(File.ReadAllBytes(@"C:\path\to\your\document.pdf"));
fileContent.Headers.ContentType = new MediaTypeHeaderValue("application/pdf");
form.Add(fileContent, "files", "document.pdf");

// Add parameters
form.Add(new StringContent(Guid.NewGuid().ToString()), "id");
form.Add(new StringContent("YOUR_CONTEXT_ID"), "contextId");

// Send request
var response = await client.PostAsync("https://apix.docdigitizer.com/sync", form);
var result = await response.Content.ReadAsStringAsync();
Console.WriteLine(result);

Understanding the Response

A successful response returns a JSON object containing the extracted data from your document.

Response Structure

{
    "StateText": "COMPLETED",
    "TraceId": "ABC1234",
    "NumberPages": 2,
    "Output": [
        {
            "docType": "Invoice",
            "country": "PT",
            "pages": [1, 2],
            "schema": "Invoice_PT.json",
            "extraction": {
                "invoiceNumber": "INV-2024-001",
                "invoiceDate": "2024-01-15",
                "totalAmount": 1250.00,
                "vendorName": "Example Corp",
                "vendorTaxId": "123456789",
                ...
            }
        }
    ]
}

Response Fields

Field Type Description
StateText String Processing status: COMPLETED, PROCESSING, or ERROR.
TraceId String Unique trace identifier for debugging and support requests.
NumberPages Integer Number of pages in the processed document.
Output Array Array of extraction results. Multi-document PDFs may contain multiple entries.

Extraction Object Fields

Field Type Description
docType String The detected document type (Invoice, Contract, CitizenCard, etc.).
country String ISO country code for the document.
pages Array Page numbers where this document was found.
schema String The schema used for extraction.
extraction Object The extracted field values. Structure varies by document type.

Error Response

If an error occurs, the response will include error details:

{
    "StateText": "ERROR",
    "TraceId": "XYZ9876",
    "Messages": [
        "Invalid file format. Only PDF files are accepted."
    ]
}