Extract Endpoint

The Extract endpoint is the primary API for processing documents and extracting structured data. Submit a PDF document and receive extracted fields based on the detected document type.

Endpoint

URL https://apix.docdigitizer.com/sync
Method POST
Content-Type multipart/form-data

Authentication

All requests must include an API key in the request headers.

Header Required Description
x-api-key Yes Your DocDigitizer API key

See Authentication for details.


Request

Request Headers

Header Required Description
x-api-key Yes Your API key for authentication
Content-Type Yes Must be multipart/form-data
Accept No Optional. Default: application/json

Request Body Parameters

Parameter Type Required Description
files File (binary) Yes The PDF document to process. Must be a valid PDF file.
id String (UUID) Yes Unique identifier for this document/request. Must be a valid UUID v4 format.
contextId String (UUID) Yes Your context identifier. Determines processing pipeline and schema configuration.

File Requirements

Requirement Value
File Format PDF only (application/pdf)
Maximum File Size 50 MB
Maximum Pages 100 pages
Minimum Resolution 72 DPI (300 DPI recommended for best results)

Request Example

POST /sync HTTP/1.1
Host: apix.docdigitizer.com
x-api-key: dd_live_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW

------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="files"; filename="invoice.pdf"
Content-Type: application/pdf

[Binary PDF data]
------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="id"

550e8400-e29b-41d4-a716-446655440000
------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="contextId"

123e4567-e89b-12d3-a456-426614174000
------WebKitFormBoundary7MA4YWxkTrZu0gW--
    

Response

Response Headers

Header Description
Content-Type application/json
X-DD-TraceId Request trace identifier for debugging
X-DD-Timer-Total Total processing time in milliseconds
X-DD-Timer-OCR OCR processing time in milliseconds
X-DD-Timer-Classification Document classification time in milliseconds
X-DD-Timer-Extraction Data extraction time in milliseconds

Success Response (200 OK)

Response Body Schema

Field Type Description
StateText String Processing status: COMPLETEDPROCESSING, or ERROR
TraceId String Unique 7-character trace identifier for this request
NumberPages Integer Total number of pages in the processed document
Output Array Array of extraction results (see Output Object below)
Messages Array Array of informational or error messages (if any)

Output Object Schema

Each item in the Output array represents one detected document within the PDF.

Field Type Description
docType String Detected document type (e.g., InvoiceContractCitizenCard)
country String ISO 3166-1 alpha-2 country code (e.g., PTUSGB)
pages Array[Integer] Page numbers where this document was detected (1-indexed)
schema String Schema identifier used for extraction (e.g., Invoice_PT.json)
extraction Object Extracted field values. Structure varies by document type.

Success Response Example

{
    "StateText": "COMPLETED",
    "TraceId": "ABC1234",
    "NumberPages": 3,
    "Output": [
        {
            "docType": "Invoice",
            "country": "PT",
            "pages": [1, 2],
            "schema": "Invoice_PT.json",
            "extraction": {
                "invoiceNumber": "INV-2024-001234",
                "invoiceDate": "2024-01-15",
                "dueDate": "2024-02-15",
                "vendorName": "Fornecedor Exemplo, Lda",
                "vendorTaxId": "PT509123456",
                "vendorAddress": "Av. da Liberdade, 100, 1250-096 Lisboa",
                "customerName": "Cliente Exemplo, SA",
                "customerTaxId": "PT501234567",
                "customerAddress": "Rua Augusta, 50, 1100-053 Lisboa",
                "subtotal": 1000.00,
                "taxRate": 23,
                "taxAmount": 230.00,
                "totalAmount": 1230.00,
                "currency": "EUR",
                "paymentTerms": "30 days",
                "bankAccount": "PT50 0035 0000 00000000000 00",
                "lineItems": [
                    {
                        "description": "Professional Services - January 2024",
                        "quantity": 40,
                        "unit": "hours",
                        "unitPrice": 25.00,
                        "amount": 1000.00,
                        "taxRate": 23
                    }
                ]
            }
        },
        {
            "docType": "Receipt",
            "country": "PT",
            "pages": [3],
            "schema": "Receipt_PT.json",
            "extraction": {
                "receiptNumber": "REC-2024-005678",
                "receiptDate": "2024-01-20",
                "amount": 1230.00,
                "paymentMethod": "Bank Transfer",
                "relatedInvoice": "INV-2024-001234"
            }
        }
    ]
}
    

Document Types

DocDigitizer automatically detects and extracts data from various document types.

Supported Document Types

Document Type docType Value Description
Invoice Invoice Commercial invoices, bills
Receipt Receipt Payment receipts, sales receipts
Contract Contract Service contracts, agreements
Citizen Card CitizenCard National ID cards
Passport Passport International passports
Driver’s License DriversLicense Driving permits
Bank Statement BankStatement Account statements
Utility Bill UtilityBill Electricity, water, gas bills
Purchase Order PurchaseOrder POs, order confirmations
Delivery Note DeliveryNote Shipping documents, packing lists

See Schema Definition for detailed field specifications per document type.


Examples

cURL

curl -X POST https://apix.docdigitizer.com/sync \
  -H "x-api-key: dd_live_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6" \
  -F "files=@invoice.pdf" \
  -F "id=550e8400-e29b-41d4-a716-446655440000" \
  -F "contextId=123e4567-e89b-12d3-a456-426614174000"
    

PowerShell

$headers = @{
    "x-api-key" = "dd_live_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6"
}

$form = @{
    files = Get-Item -Path "invoice.pdf"
    id = "550e8400-e29b-41d4-a716-446655440000"
    contextId = "123e4567-e89b-12d3-a456-426614174000"
}

$response = Invoke-RestMethod -Uri "https://apix.docdigitizer.com/sync" `
    -Method Post -Headers $headers -Form $form

$response | ConvertTo-Json -Depth 10
    

Python

import requests

url = "https://apix.docdigitizer.com/sync"
headers = {"x-api-key": "dd_live_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6"}

files = {"files": open("invoice.pdf", "rb")}
data = {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "contextId": "123e4567-e89b-12d3-a456-426614174000"
}

response = requests.post(url, headers=headers, files=files, data=data)
print(response.json())
    

JavaScript (Node.js)

const FormData = require('form-data');
const fs = require('fs');
const fetch = require('node-fetch');

const form = new FormData();
form.append('files', fs.createReadStream('invoice.pdf'));
form.append('id', '550e8400-e29b-41d4-a716-446655440000');
form.append('contextId', '123e4567-e89b-12d3-a456-426614174000');

fetch('https://apix.docdigitizer.com/sync', {
    method: 'POST',
    headers: {
        'x-api-key': 'dd_live_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6'
    },
    body: form
})
.then(res => res.json())
.then(data => console.log(JSON.stringify(data, null, 2)));
    

C# (.NET)

using var client = new HttpClient();
using var form = new MultipartFormDataContent();

client.DefaultRequestHeaders.Add("x-api-key", "dd_live_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6");

var fileContent = new ByteArrayContent(File.ReadAllBytes("invoice.pdf"));
fileContent.Headers.ContentType = new MediaTypeHeaderValue("application/pdf");
form.Add(fileContent, "files", "invoice.pdf");
form.Add(new StringContent("550e8400-e29b-41d4-a716-446655440000"), "id");
form.Add(new StringContent("123e4567-e89b-12d3-a456-426614174000"), "contextId");

var response = await client.PostAsync("https://apix.docdigitizer.com/sync", form);
var result = await response.Content.ReadAsStringAsync();
Console.WriteLine(result);
    

PHP

<?php
$curl = curl_init();

$postFields = [
    'files' => new CURLFile('invoice.pdf', 'application/pdf'),
    'id' => '550e8400-e29b-41d4-a716-446655440000',
    'contextId' => '123e4567-e89b-12d3-a456-426614174000'
];

curl_setopt_array($curl, [
    CURLOPT_URL => 'https://apix.docdigitizer.com/sync',
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_POST => true,
    CURLOPT_POSTFIELDS => $postFields,
    CURLOPT_HTTPHEADER => [
        'x-api-key: dd_live_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6'
    ]
]);

$response = curl_exec($curl);
curl_close($curl);

echo $response;
?>
    

Rate Limits

API requests are subject to rate limiting to ensure fair usage.

Limit Type Default Description
Requests per minute 60 Maximum API calls per minute per API key
Concurrent requests 10 Maximum simultaneous requests per API key
Daily requests Based on plan Maximum requests per 24-hour period

Rate Limit Headers

Header Description
X-RateLimit-Limit Maximum requests allowed in the current window
X-RateLimit-Remaining Requests remaining in the current window
X-RateLimit-Reset Unix timestamp when the limit resets

Contact sales@docdigitizer.com for higher rate limits.


Errors

HTTP Status Codes

Code Status Description
200 OK Request processed successfully (check StateText for result)
400 Bad Request Invalid request format or missing required parameters
401 Unauthorized Missing or invalid API key
403 Forbidden Valid API key but insufficient permissions
413 Payload Too Large File exceeds maximum size limit (50 MB)
415 Unsupported Media Type File is not a valid PDF
429 Too Many Requests Rate limit exceeded
500 Internal Server Error Server error; retry or contact support
503 Service Unavailable Service temporarily unavailable; retry later

Error Response Format

{
    "StateText": "ERROR",
    "TraceId": "XYZ9876",
    "Messages": [
        "Error description here"
    ]
}
    

Common Errors

Error Message Cause Solution
Invalid API key API key not recognized Verify your API key is correct
Missing required parameter: id Document ID not provided Include the id parameter with a valid UUID
Missing required parameter: contextId Context ID not provided Include the contextId parameter
Invalid file format File is not a valid PDF Ensure you’re uploading a genuine PDF file
File too large PDF exceeds 50 MB limit Reduce file size or split into smaller documents
Invalid UUID format ID or contextId not in UUID format Use valid UUID v4 format (e.g., 550e8400-e29b-41d4-a716-446655440000)
Context not found Context ID doesn’t exist or is not associated with API key Verify your Context ID is correct
Document processing failed Unable to extract data from document Check document quality; contact support with TraceId

See Error Handling for best practices on handling errors.