Error Handling

This guide covers how to handle errors from the DocDigitizer API, including error response formats, common error scenarios, and best practices for building robust integrations.

Error Response Format

When an error occurs, the API returns a JSON response with details about the failure.

Standard Error Response

{
    "StateText": "ERROR",
    "TraceId": "XYZ9876",
    "Messages": [
        "Primary error message",
        "Additional context or details"
    ]
}
    

Error Response Fields

Field Type Description
StateText String Always "ERROR" for failed requests
TraceId String Unique identifier for tracing the request. Include this when contacting support.
Messages Array[String] One or more error messages describing the failure

HTTP Error Response (Non-200)

For HTTP-level errors (4xx, 5xx), the response may also include:

{
    "error": "Unauthorized",
    "message": "Invalid API key",
    "statusCode": 401
}
    

HTTP Status Codes

Success Codes

Code Status Description
200 OK Request processed. Check StateText for actual result.

Important: A 200 OK response doesn’t guarantee success. Always check the StateText field in the response body.

Client Error Codes (4xx)

Code Status Description Action
400 Bad Request Malformed request or missing parameters Fix request format; do not retry without changes
401 Unauthorized Missing or invalid API key Verify API key; do not retry without fixing
403 Forbidden Valid key but access denied Check permissions and Context ID
404 Not Found Endpoint not found Verify the URL is correct
413 Payload Too Large File exceeds 50 MB limit Reduce file size or split document
415 Unsupported Media Type File is not a valid PDF Ensure file is a genuine PDF
422 Unprocessable Entity Valid request but cannot process content Check document quality and format
429 Too Many Requests Rate limit exceeded Wait and retry with exponential backoff

Server Error Codes (5xx)

Code Status Description Action
500 Internal Server Error Unexpected server error Retry with exponential backoff; contact support if persistent
502 Bad Gateway Upstream service error Retry after brief delay
503 Service Unavailable Service temporarily unavailable Retry with exponential backoff
504 Gateway Timeout Request timed out Retry; consider smaller documents

Error Categories

Authentication Errors

Error Message Cause Resolution
Invalid API key API key not recognized Verify your API key is correct and active
API key expired Key has been revoked or expired Contact support to obtain a new key
Missing API key x-api-key header not provided Include the x-api-key header in your request

Validation Errors

Error Message Cause Resolution
Missing required parameter: id Document ID not provided Include the id parameter with a valid UUID
Missing required parameter: contextId Context ID not provided Include the contextId parameter
Missing required parameter: files No file uploaded Attach a PDF file to the request
Invalid UUID format ID is not a valid UUID Use UUID v4 format: xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
Invalid file format File is not a PDF Upload only PDF files
File too large File exceeds 50 MB Compress or split the document
Too many pages PDF has more than 100 pages Split into smaller documents

Processing Errors

Error Message Cause Resolution
Document processing failed Unable to process document Check document quality; retry or contact support
OCR failed Text extraction failed Ensure document is readable; minimum 72 DPI
Classification failed Unable to determine document type Document may not be a supported type
Extraction failed Unable to extract fields Document quality may be poor; contact support
Timeout Processing exceeded time limit Try a smaller document; contact support for large files

Context Errors

Error Message Cause Resolution
Context not found Context ID doesn’t exist Verify your Context ID is correct
Context mismatch Context ID not associated with API key Use the Context ID provided with your API key
Context disabled Context has been deactivated Contact support to reactivate

Retry Strategies

When to Retry

Error Type Retry? Strategy
4xx Client Errors No Fix the request before retrying
429 Rate Limited Yes Wait for rate limit reset, use exponential backoff
5xx Server Errors Yes Retry with exponential backoff
Network Timeout Yes Retry with exponential backoff
Connection Refused Yes Retry after delay; check network

Exponential Backoff

For retryable errors, implement exponential backoff to avoid overwhelming the service:

Retry 1: Wait 1 second
Retry 2: Wait 2 seconds
Retry 3: Wait 4 seconds
Retry 4: Wait 8 seconds
Retry 5: Wait 16 seconds
Maximum: 5 retries or 60 seconds total
    

Backoff Formula

wait_time = min(base_delay * (2 ^ attempt), max_delay) + random_jitter
    

Where:

  • base_delay = 1 second
  • max_delay = 32 seconds
  • random_jitter = 0 to 1 second (prevents thundering herd)

Rate Limit Handling

When you receive a 429 response, check the X-RateLimit-Reset header:

X-RateLimit-Reset: 1704067200
    

Wait until the Unix timestamp before retrying.


Error Handling Examples

PowerShell

function Invoke-DocDigitizerApi {
    param(
        [string]$PdfPath,
        [string]$ApiKey,
        [string]$ContextId,
        [int]$MaxRetries = 3
    )

    $headers = @{ "x-api-key" = $ApiKey }
    $attempt = 0

    while ($attempt -lt $MaxRetries) {
        $attempt++

        try {
            $form = @{
                files = Get-Item -Path $PdfPath
                id = [guid]::NewGuid().ToString()
                contextId = $ContextId
            }

            $response = Invoke-RestMethod -Uri "https://apix.docdigitizer.com/sync" `
                -Method Post -Headers $headers -Form $form -ErrorAction Stop

            # Check application-level status
            if ($response.StateText -eq "COMPLETED") {
                return $response
            }
            elseif ($response.StateText -eq "ERROR") {
                Write-Error "Processing error: $($response.Messages -join ', ')"
                return $null
            }
        }
        catch {
            $statusCode = $_.Exception.Response.StatusCode.value__

            # Don't retry client errors (except 429)
            if ($statusCode -ge 400 -and $statusCode -lt 500 -and $statusCode -ne 429) {
                Write-Error "Client error ($statusCode): $_"
                return $null
            }

            # Retry server errors and rate limits
            if ($attempt -lt $MaxRetries) {
                $delay = [math]::Pow(2, $attempt)
                Write-Warning "Attempt $attempt failed. Retrying in $delay seconds..."
                Start-Sleep -Seconds $delay
            }
            else {
                Write-Error "Max retries exceeded: $_"
                return $null
            }
        }
    }
}
    

Python

import requests
import time
import random
import uuid

def call_docdigitizer_api(pdf_path, api_key, context_id, max_retries=3):
    url = "https://apix.docdigitizer.com/sync"
    headers = {"x-api-key": api_key}

    for attempt in range(max_retries):
        try:
            with open(pdf_path, "rb") as f:
                files = {"files": f}
                data = {
                    "id": str(uuid.uuid4()),
                    "contextId": context_id
                }

                response = requests.post(
                    url,
                    headers=headers,
                    files=files,
                    data=data,
                    timeout=300
                )

            # Handle HTTP errors
            if response.status_code == 429:
                # Rate limited - check reset header
                reset_time = response.headers.get("X-RateLimit-Reset")
                if reset_time:
                    wait = int(reset_time) - int(time.time())
                    time.sleep(max(wait, 1))
                    continue

            if response.status_code >= 400 and response.status_code < 500:
                # Client error - don't retry
                raise Exception(f"Client error {response.status_code}: {response.text}")

            if response.status_code >= 500:
                # Server error - retry with backoff
                raise Exception(f"Server error {response.status_code}")

            # Parse response
            result = response.json()

            if result.get("StateText") == "COMPLETED":
                return result
            elif result.get("StateText") == "ERROR":
                raise Exception(f"Processing error: {result.get('Messages', [])}")

        except requests.exceptions.RequestException as e:
            if attempt < max_retries - 1:
                # Exponential backoff with jitter
                delay = (2 ** attempt) + random.uniform(0, 1)
                print(f"Attempt {attempt + 1} failed. Retrying in {delay:.1f}s...")
                time.sleep(delay)
            else:
                raise

    raise Exception("Max retries exceeded")
    

JavaScript (Node.js)

const fetch = require('node-fetch');
const FormData = require('form-data');
const fs = require('fs');
const { v4: uuidv4 } = require('uuid');

async function callDocDigitizerApi(pdfPath, apiKey, contextId, maxRetries = 3) {
    const url = 'https://apix.docdigitizer.com/sync';

    for (let attempt = 0; attempt < maxRetries; attempt++) {
        try {
            const form = new FormData();
            form.append('files', fs.createReadStream(pdfPath));
            form.append('id', uuidv4());
            form.append('contextId', contextId);

            const response = await fetch(url, {
                method: 'POST',
                headers: { 'x-api-key': apiKey },
                body: form
            });

            // Handle rate limiting
            if (response.status === 429) {
                const resetTime = response.headers.get('X-RateLimit-Reset');
                const waitTime = resetTime ? (parseInt(resetTime) - Date.now() / 1000) : 60;
                await sleep(Math.max(waitTime * 1000, 1000));
                continue;
            }

            // Client errors - don't retry
            if (response.status >= 400 && response.status < 500) {
                throw new Error(`Client error ${response.status}: ${await response.text()}`);
            }

            // Server errors - retry
            if (response.status >= 500) {
                throw new Error(`Server error ${response.status}`);
            }

            const result = await response.json();

            if (result.StateText === 'COMPLETED') {
                return result;
            } else if (result.StateText === 'ERROR') {
                throw new Error(`Processing error: ${result.Messages?.join(', ')}`);
            }

        } catch (error) {
            if (attempt < maxRetries - 1) {
                const delay = Math.pow(2, attempt) * 1000 + Math.random() * 1000;
                console.log(`Attempt ${attempt + 1} failed. Retrying in ${delay / 1000}s...`);
                await sleep(delay);
            } else {
                throw error;
            }
        }
    }

    throw new Error('Max retries exceeded');
}

function sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
}
    

C# (.NET)

public async Task<ApiResponse> CallDocDigitizerApiAsync(
    string pdfPath,
    string apiKey,
    string contextId,
    int maxRetries = 3)
{
    using var client = new HttpClient();
    client.DefaultRequestHeaders.Add("x-api-key", apiKey);

    for (int attempt = 0; attempt < maxRetries; attempt++)
    {
        try
        {
            using var form = new MultipartFormDataContent();

            var fileContent = new ByteArrayContent(await File.ReadAllBytesAsync(pdfPath));
            fileContent.Headers.ContentType = new MediaTypeHeaderValue("application/pdf");
            form.Add(fileContent, "files", Path.GetFileName(pdfPath));
            form.Add(new StringContent(Guid.NewGuid().ToString()), "id");
            form.Add(new StringContent(contextId), "contextId");

            var response = await client.PostAsync("https://apix.docdigitizer.com/sync", form);

            // Handle rate limiting
            if (response.StatusCode == (HttpStatusCode)429)
            {
                if (response.Headers.TryGetValues("X-RateLimit-Reset", out var values))
                {
                    var resetTime = long.Parse(values.First());
                    var waitTime = DateTimeOffset.FromUnixTimeSeconds(resetTime) - DateTimeOffset.UtcNow;
                    await Task.Delay(waitTime > TimeSpan.Zero ? waitTime : TimeSpan.FromSeconds(1));
                    continue;
                }
            }

            // Client errors - don't retry
            if ((int)response.StatusCode >= 400 && (int)response.StatusCode < 500)
            {
                var error = await response.Content.ReadAsStringAsync();
                throw new ApplicationException($"Client error {response.StatusCode}: {error}");
            }

            // Server errors - retry
            if ((int)response.StatusCode >= 500)
            {
                throw new HttpRequestException($"Server error {response.StatusCode}");
            }

            var json = await response.Content.ReadAsStringAsync();
            var result = JsonSerializer.Deserialize<ApiResponse>(json);

            if (result.StateText == "COMPLETED")
                return result;

            if (result.StateText == "ERROR")
                throw new ApplicationException($"Processing error: {string.Join(", ", result.Messages)}");
        }
        catch (Exception) when (attempt < maxRetries - 1)
        {
            var delay = TimeSpan.FromSeconds(Math.Pow(2, attempt)) +
                        TimeSpan.FromMilliseconds(Random.Shared.Next(1000));
            Console.WriteLine($"Attempt {attempt + 1} failed. Retrying in {delay.TotalSeconds:F1}s...");
            await Task.Delay(delay);
        }
    }

    throw new Exception("Max retries exceeded");
}
    

Logging and Debugging

Always Log the TraceId

The TraceId is essential for debugging. Always log it:

// Python example
result = response.json()
logger.info(f"API call completed. TraceId: {result.get('TraceId')}, Status: {result.get('StateText')}")

// PowerShell example
Write-Host "TraceId: $($response.TraceId), Status: $($response.StateText)"
    

Information to Log

  • TraceId – Always include in logs and error reports
  • Document ID – The UUID you generated for the request
  • HTTP Status Code – The response status
  • StateText – COMPLETED, ERROR, or PROCESSING
  • Error Messages – The Messages array if present
  • Timestamp – When the request was made
  • Processing Time – From X-DD-Timer-Total header

Debugging Checklist

  1. Verify API key is correct (no extra whitespace)
  2. Verify Context ID is correct
  3. Check file is a valid PDF (open in PDF viewer)
  4. Check file size is under 50 MB
  5. Try with a simple, known-good document
  6. Check network connectivity to apix.docdigitizer.com
  7. Review the full error message in the response
  8. Contact support with the TraceId if issues persist

Best Practices

1. Always Check StateText

200 OK HTTP status doesn’t mean success. Always check StateText:

if response.status_code == 200:
    result = response.json()
    if result["StateText"] == "COMPLETED":
        # Success - process the data
    elif result["StateText"] == "ERROR":
        # Application-level error
    

2. Implement Proper Retry Logic

  • Use exponential backoff for retries
  • Add random jitter to prevent thundering herd
  • Set a maximum retry count (3-5 recommended)
  • Don’t retry 4xx errors (except 429)

3. Handle Timeouts Gracefully

  • Set appropriate timeout values (5 minutes for large documents)
  • Implement client-side timeout handling
  • Consider using shorter timeouts for small documents

4. Log Everything

  • Log TraceId for every request
  • Log errors with full context
  • Include timestamps and document identifiers
  • Don’t log sensitive data (API keys)

5. Validate Input Before Sending

  • Check file exists and is readable
  • Verify file is a PDF (check magic bytes: %PDF-)
  • Validate file size before upload
  • Validate UUID formats

6. Use Idempotency

Generate the document id before making the request and reuse it for retries. This helps with tracking and prevents duplicate processing.

7. Monitor and Alert

  • Track error rates by type
  • Alert on unusual error patterns
  • Monitor response times
  • Track rate limit usage

Getting Help

If you encounter persistent errors:

  1. Check this documentation for common solutions
  2. Contact support at support@docdigitizer.com

When contacting support, include:

  • TraceId from the response
  • Document ID you used
  • Timestamp of the request
  • Full error message
  • Sample document (if possible)