Your First API Call

This step-by-step tutorial walks you through making your first successful API call to DocDigitizer. By the end, you'll have extracted structured data from a PDF document.

Before You Begin

Ensure you have the following ready:

Item Description Status
API Key Your DocDigitizer API key [ ] Ready
Context ID Your processing context UUID [ ] Ready
PDF Document A sample PDF to process (invoice, ID card, contract, etc.) [ ] Ready
HTTP Client cURL, PowerShell, Postman, or your preferred tool [ ] Ready

If you don’t have your credentials yet, see the Installation guide.


Step 1: Prepare Your Request

Every API call requires three pieces of information in the request body and one in the headers.

Request Components

Component Location Value
x-api-key Header Your API key
files Body (form-data) Your PDF file
id Body (form-data) A new UUID for this request
contextId Body (form-data) Your Context ID

Generate a Document ID

Each request needs a unique document ID (UUID). Here’s how to generate one:

PowerShell

[guid]::NewGuid().ToString()
# Output: 550e8400-e29b-41d4-a716-446655440000

Linux/macOS

uuidgen
# Output: 550e8400-e29b-41d4-a716-446655440000

Python

import uuid
print(uuid.uuid4())
# Output: 550e8400-e29b-41d4-a716-446655440000

Online

Use an online UUID generator like uuidgenerator.net


Step 2: Send the Request

Choose your preferred method to send the API request.

Option A: Using cURL (Recommended for Testing)

Open your terminal and run:

curl -X POST https://apix.docdigitizer.com/sync \
  -H "x-api-key: YOUR_API_KEY" \
  -F "files=@/path/to/your/document.pdf" \
  -F "id=550e8400-e29b-41d4-a716-446655440000" \
  -F "contextId=YOUR_CONTEXT_ID"

Replace:

  • YOUR_API_KEY – Your actual API key
  • /path/to/your/document.pdf – Path to your PDF file
  • 550e8400-... – A newly generated UUID
  • YOUR_CONTEXT_ID – Your Context ID

Option B: Using PowerShell

# Set your credentials
$apiKey = "YOUR_API_KEY"
$contextId = "YOUR_CONTEXT_ID"
$pdfPath = "C:\path\to\your\document.pdf"

# Create the request
$headers = @{
    "x-api-key" = $apiKey
}

$form = @{
    files = Get-Item -Path $pdfPath
    id = [guid]::NewGuid().ToString()
    contextId = $contextId
}

# Send the request
$response = Invoke-RestMethod -Uri "https://apix.docdigitizer.com/sync" `
    -Method Post `
    -Headers $headers `
    -Form $form

# Display the response
$response | ConvertTo-Json -Depth 10

Option C: Using Postman

  1. Create a new POST request
  2. Set URL to: https://apix.docdigitizer.com/sync
  3. Go to Headers tab:
    • Add header: x-api-key = YOUR_API_KEY
  4. Go to Body tab:
    • Select form-data
    • Add key files (type: File) – select your PDF
    • Add key id (type: Text) – enter a UUID
    • Add key contextId (type: Text) – enter your Context ID
  5. Click Send

Expected Processing Time

Processing time depends on document complexity:

Document Type Typical Time
Single-page invoice 2-5 seconds
Multi-page contract 5-15 seconds
Complex multi-document PDF 15-60 seconds

Step 3: Understand the Response

A successful API call returns a JSON response with the extracted data.

Success Response Example

{
    "StateText": "COMPLETED",
    "TraceId": "ABC1234",
    "NumberPages": 2,
    "Output": [
        {
            "docType": "Invoice",
            "country": "PT",
            "pages": [1, 2],
            "schema": "Invoice_PT.json",
            "extraction": {
                "invoiceNumber": "INV-2024-001",
                "invoiceDate": "2024-01-15",
                "dueDate": "2024-02-15",
                "vendorName": "Example Supplier Ltd",
                "vendorTaxId": "PT123456789",
                "customerName": "Your Company",
                "subtotal": 1000.00,
                "taxAmount": 230.00,
                "totalAmount": 1230.00,
                "currency": "EUR",
                "lineItems": [
                    {
                        "description": "Professional Services",
                        "quantity": 10,
                        "unitPrice": 100.00,
                        "amount": 1000.00
                    }
                ]
            }
        }
    ]
}

Response Fields Explained

Field Description
StateText COMPLETED = success, ERROR = failure
TraceId Unique ID for this request (save this for support inquiries)
NumberPages Total pages in the PDF
Output Array of extracted documents (one PDF can contain multiple documents)
Output[].docType Detected document type (Invoice, Contract, CitizenCard, etc.)
Output[].extraction The extracted field values

Step 4: Extract the Data

Now that you have a response, here’s how to access the extracted data in different languages.

PowerShell

# After receiving $response from the API call

# Get the first document's extraction
$extraction = $response.Output[0].extraction

# Access specific fields
Write-Host "Invoice Number: $($extraction.invoiceNumber)"
Write-Host "Total Amount: $($extraction.totalAmount)"
Write-Host "Vendor: $($extraction.vendorName)"

# Loop through line items
foreach ($item in $extraction.lineItems) {
    Write-Host "  - $($item.description): $($item.amount)"
}

Python

import requests
import uuid

# Make the API call
response = requests.post(
    "https://apix.docdigitizer.com/sync",
    headers={"x-api-key": "YOUR_API_KEY"},
    files={"files": open("document.pdf", "rb")},
    data={
        "id": str(uuid.uuid4()),
        "contextId": "YOUR_CONTEXT_ID"
    }
)

# Parse the response
result = response.json()

# Check for success
if result["StateText"] == "COMPLETED":
    # Get the first document
    doc = result["Output"][0]
    extraction = doc["extraction"]

    print(f"Document Type: {doc['docType']}")
    print(f"Invoice Number: {extraction.get('invoiceNumber', 'N/A')}")
    print(f"Total Amount: {extraction.get('totalAmount', 'N/A')}")
else:
    print(f"Error: {result.get('Messages', ['Unknown error'])}")

JavaScript (Node.js)

const FormData = require('form-data');
const fs = require('fs');
const fetch = require('node-fetch');
const { v4: uuidv4 } = require('uuid');

async function extractDocument() {
    const form = new FormData();
    form.append('files', fs.createReadStream('document.pdf'));
    form.append('id', uuidv4());
    form.append('contextId', 'YOUR_CONTEXT_ID');

    const response = await fetch('https://apix.docdigitizer.com/sync', {
        method: 'POST',
        headers: { 'x-api-key': 'YOUR_API_KEY' },
        body: form
    });

    const result = await response.json();

    if (result.StateText === 'COMPLETED') {
        const doc = result.Output[0];
        const extraction = doc.extraction;

        console.log(`Document Type: ${doc.docType}`);
        console.log(`Invoice Number: ${extraction.invoiceNumber}`);
        console.log(`Total Amount: ${extraction.totalAmount}`);
    } else {
        console.error('Error:', result.Messages);
    }
}

extractDocument();

C# (.NET)

using System.Text.Json;

// After making the HTTP request and getting responseString
var result = JsonDocument.Parse(responseString);
var root = result.RootElement;

if (root.GetProperty("StateText").GetString() == "COMPLETED")
{
    var output = root.GetProperty("Output")[0];
    var extraction = output.GetProperty("extraction");

    Console.WriteLine($"Document Type: {output.GetProperty("docType")}");
    Console.WriteLine($"Invoice Number: {extraction.GetProperty("invoiceNumber")}");
    Console.WriteLine($"Total Amount: {extraction.GetProperty("totalAmount")}");
}

Complete Working Examples

Copy and paste these complete examples to get started quickly.

Complete PowerShell Script

# DocDigitizer API - Complete Example
# Save as: Extract-Document.ps1

param(
    [Parameter(Mandatory=$true)]
    [string]$PdfPath,

    [Parameter(Mandatory=$true)]
    [string]$ApiKey,

    [Parameter(Mandatory=$true)]
    [string]$ContextId
)

# Validate file exists
if (-not (Test-Path $PdfPath)) {
    Write-Error "File not found: $PdfPath"
    exit 1
}

# Create request
$headers = @{ "x-api-key" = $ApiKey }
$documentId = [guid]::NewGuid().ToString()

$form = @{
    files = Get-Item -Path $PdfPath
    id = $documentId
    contextId = $ContextId
}

Write-Host "Sending document to DocDigitizer API..."
Write-Host "Document ID: $documentId"

try {
    $response = Invoke-RestMethod -Uri "https://apix.docdigitizer.com/sync" `
        -Method Post `
        -Headers $headers `
        -Form $form

    if ($response.StateText -eq "COMPLETED") {
        Write-Host "`nExtraction successful!" -ForegroundColor Green
        Write-Host "Trace ID: $($response.TraceId)"
        Write-Host "Pages: $($response.NumberPages)"
        Write-Host "`nExtracted Documents:"

        foreach ($doc in $response.Output) {
            Write-Host "`n--- $($doc.docType) ---"
            $doc.extraction | Format-List
        }
    } else {
        Write-Host "`nExtraction failed!" -ForegroundColor Red
        Write-Host "Messages: $($response.Messages -join ', ')"
    }
}
catch {
    Write-Error "API request failed: $_"
}

# Usage: .\Extract-Document.ps1 -PdfPath "invoice.pdf" -ApiKey "your_key" -ContextId "your_context"

Complete Python Script

#!/usr/bin/env python3
"""
DocDigitizer API - Complete Example
Save as: extract_document.py
"""

import requests
import uuid
import sys
import json
import os

def extract_document(pdf_path, api_key, context_id):
    """Extract data from a PDF document using DocDigitizer API."""

    # Validate file exists
    if not os.path.exists(pdf_path):
        print(f"Error: File not found: {pdf_path}")
        return None

    # Generate unique document ID
    document_id = str(uuid.uuid4())

    print(f"Sending document to DocDigitizer API...")
    print(f"Document ID: {document_id}")

    # Make API request
    try:
        response = requests.post(
            "https://apix.docdigitizer.com/sync",
            headers={"x-api-key": api_key},
            files={"files": open(pdf_path, "rb")},
            data={
                "id": document_id,
                "contextId": context_id
            },
            timeout=300  # 5 minute timeout
        )
        response.raise_for_status()
        result = response.json()

    except requests.exceptions.RequestException as e:
        print(f"API request failed: {e}")
        return None

    # Process response
    if result.get("StateText") == "COMPLETED":
        print(f"\nExtraction successful!")
        print(f"Trace ID: {result.get('TraceId')}")
        print(f"Pages: {result.get('NumberPages')}")

        for i, doc in enumerate(result.get("Output", []), 1):
            print(f"\n--- Document {i}: {doc.get('docType')} ---")
            print(json.dumps(doc.get("extraction", {}), indent=2))

        return result
    else:
        print(f"\nExtraction failed!")
        print(f"Messages: {result.get('Messages', ['Unknown error'])}")
        return None

if __name__ == "__main__":
    if len(sys.argv) != 4:
        print("Usage: python extract_document.py   ")
        sys.exit(1)

    extract_document(sys.argv[1], sys.argv[2], sys.argv[3])

Troubleshooting

Common Issues

Problem Cause Solution
401 Unauthorized Invalid API key Check your API key is correct and has no extra spaces
400 Bad Request Missing required fields Ensure filesid, and contextId are all present
Invalid file format File is not a valid PDF Verify the file is a genuine PDF (check magic bytes)
Timeout Large or complex document Increase timeout; consider splitting large PDFs
Empty extraction Document type not supported or poor quality Check document quality; contact support if issue persists