Invoice Processing Guide
This guide walks you through processing invoices with the DocDigitizer API. Learn how to extract invoice data, handle different invoice formats, and integrate invoice processing into your applications.
Overview
DocDigitizer’s invoice processing extracts structured data from commercial invoices, including header information, line items, totals, and payment details.
Supported Invoice Types
- Commercial invoices
- Tax invoices
- Pro forma invoices
- Credit notes
- Debit notes
- Self-billing invoices
Supported Countries
Invoice extraction supports country-specific formats for:
| Region | Countries |
|---|---|
| Europe | Portugal (PT), Spain (ES), France (FR), Germany (DE), Italy (IT), UK (GB), Netherlands (NL), Belgium (BE) |
| Americas | United States (US), Brazil (BR), Mexico (MX), Canada (CA) |
| Other | Generic format for unlisted countries |
Invoice Fields
The following fields are extracted from invoices. Field availability depends on the invoice content.
Header Fields
| Field | Type | Description |
|---|---|---|
invoiceNumber |
String | Invoice number or reference |
invoiceDate |
String (Date) | Invoice issue date (YYYY-MM-DD) |
dueDate |
String (Date) | Payment due date |
purchaseOrderNumber |
String | Related PO number |
currency |
String | Currency code (EUR, USD, GBP, etc.) |
Vendor (Seller) Fields
| Field | Type | Description |
|---|---|---|
vendorName |
String | Vendor/supplier company name |
vendorTaxId |
String | Vendor tax ID / VAT number |
vendorAddress |
String | Full vendor address |
vendorEmail |
String | Vendor email address |
vendorPhone |
String | Vendor phone number |
Customer (Buyer) Fields
| Field | Type | Description |
|---|---|---|
customerName |
String | Customer/buyer company name |
customerTaxId |
String | Customer tax ID / VAT number |
customerAddress |
String | Full customer address |
shippingAddress |
String | Shipping/delivery address (if different) |
Financial Fields
| Field | Type | Description |
|---|---|---|
subtotal |
Number | Total before tax |
taxRate |
Number | Tax rate percentage |
taxAmount |
Number | Total tax amount |
discount |
Number | Discount amount |
totalAmount |
Number | Grand total including tax |
Payment Fields
| Field | Type | Description |
|---|---|---|
paymentTerms |
String | Payment terms (Net 30, etc.) |
paymentMethod |
String | Payment method |
bankName |
String | Bank name |
bankAccount |
String | Bank account number / IBAN |
swiftCode |
String | SWIFT/BIC code |
Line Item Fields
| Field | Type | Description |
|---|---|---|
lineItems |
Array | Array of line item objects |
lineItems[].description |
String | Item description |
lineItems[].quantity |
Number | Quantity |
lineItems[].unit |
String | Unit of measure |
lineItems[].unitPrice |
Number | Price per unit |
lineItems[].amount |
Number | Line total |
lineItems[].taxRate |
Number | Tax rate for this item |
lineItems[].productCode |
String | Product/SKU code |
Processing Invoices
Basic Invoice Processing
cURL
curl -X POST https://apix.docdigitizer.com/sync \
-H "x-api-key: YOUR_API_KEY" \
-F "files=@invoice.pdf" \
-F "id=$(uuidgen)" \
-F "contextId=YOUR_CONTEXT_ID"
PowerShell
# Process an invoice
$headers = @{ "x-api-key" = "YOUR_API_KEY" }
$form = @{
files = Get-Item -Path "invoice.pdf"
id = [guid]::NewGuid().ToString()
contextId = "YOUR_CONTEXT_ID"
}
$response = Invoke-RestMethod -Uri "https://apix.docdigitizer.com/sync" `
-Method Post -Headers $headers -Form $form
# Check result
if ($response.StateText -eq "COMPLETED") {
$invoice = $response.Output | Where-Object { $_.docType -eq "Invoice" } | Select-Object -First 1
Write-Host "Invoice Number: $($invoice.extraction.invoiceNumber)"
Write-Host "Total: $($invoice.extraction.totalAmount) $($invoice.extraction.currency)"
}
Python
import requests
import uuid
def process_invoice(pdf_path, api_key, context_id):
response = requests.post(
"https://apix.docdigitizer.com/sync",
headers={"x-api-key": api_key},
files={"files": open(pdf_path, "rb")},
data={
"id": str(uuid.uuid4()),
"contextId": context_id
}
)
result = response.json()
if result["StateText"] == "COMPLETED":
# Find the invoice in the output
for doc in result["Output"]:
if doc["docType"] == "Invoice":
return doc["extraction"]
return None
# Usage
invoice_data = process_invoice("invoice.pdf", "YOUR_API_KEY", "YOUR_CONTEXT_ID")
if invoice_data:
print(f"Invoice: {invoice_data.get('invoiceNumber')}")
print(f"Total: {invoice_data.get('totalAmount')} {invoice_data.get('currency')}")
Working with Results
Example Response
{
"StateText": "COMPLETED",
"TraceId": "INV4567",
"NumberPages": 2,
"Output": [
{
"docType": "Invoice",
"country": "PT",
"pages": [1, 2],
"schema": "Invoice_PT.json",
"extraction": {
"invoiceNumber": "FT 2024/00156",
"invoiceDate": "2024-01-15",
"dueDate": "2024-02-14",
"vendorName": "Tech Solutions, Lda",
"vendorTaxId": "PT509876543",
"vendorAddress": "Av. da Liberdade, 100, 1250-096 Lisboa",
"customerName": "Global Corp, SA",
"customerTaxId": "PT501234567",
"customerAddress": "Rua do Ouro, 50, 1100-063 Lisboa",
"subtotal": 2500.00,
"taxRate": 23,
"taxAmount": 575.00,
"totalAmount": 3075.00,
"currency": "EUR",
"paymentTerms": "30 dias",
"bankAccount": "PT50 0035 0000 12345678901 94",
"lineItems": [
{
"description": "Software Development Services",
"quantity": 40,
"unit": "hours",
"unitPrice": 50.00,
"amount": 2000.00,
"taxRate": 23
},
{
"description": "Cloud Hosting - January",
"quantity": 1,
"unit": "month",
"unitPrice": 500.00,
"amount": 500.00,
"taxRate": 23
}
]
}
}
]
}
Accessing Invoice Data
PowerShell
# Assuming $response contains the API response
$invoice = $response.Output[0].extraction
# Header information
$invoiceNumber = $invoice.invoiceNumber
$invoiceDate = $invoice.invoiceDate
$dueDate = $invoice.dueDate
# Vendor information
$vendor = @{
Name = $invoice.vendorName
TaxId = $invoice.vendorTaxId
Address = $invoice.vendorAddress
}
# Financial summary
$totals = @{
Subtotal = $invoice.subtotal
Tax = $invoice.taxAmount
Total = $invoice.totalAmount
Currency = $invoice.currency
}
# Display summary
Write-Host "=== Invoice $invoiceNumber ==="
Write-Host "Date: $invoiceDate | Due: $dueDate"
Write-Host "Vendor: $($vendor.Name) ($($vendor.TaxId))"
Write-Host "Total: $($totals.Total) $($totals.Currency)"
Python
def display_invoice(extraction):
"""Display invoice data in a readable format."""
print(f"{'='*50}")
print(f"INVOICE: {extraction.get('invoiceNumber', 'N/A')}")
print(f"{'='*50}")
print(f"\nDate: {extraction.get('invoiceDate', 'N/A')}")
print(f"Due: {extraction.get('dueDate', 'N/A')}")
print(f"\n--- VENDOR ---")
print(f"Name: {extraction.get('vendorName', 'N/A')}")
print(f"Tax ID: {extraction.get('vendorTaxId', 'N/A')}")
print(f"Address: {extraction.get('vendorAddress', 'N/A')}")
print(f"\n--- CUSTOMER ---")
print(f"Name: {extraction.get('customerName', 'N/A')}")
print(f"Tax ID: {extraction.get('customerTaxId', 'N/A')}")
print(f"\n--- TOTALS ---")
currency = extraction.get('currency', 'EUR')
print(f"Subtotal: {extraction.get('subtotal', 0):.2f} {currency}")
print(f"Tax: {extraction.get('taxAmount', 0):.2f} {currency}")
print(f"TOTAL: {extraction.get('totalAmount', 0):.2f} {currency}")
Handling Line Items
Line items are returned as an array. Here’s how to process them:
PowerShell
# Process line items
$lineItems = $invoice.lineItems
Write-Host "`n=== Line Items ==="
Write-Host ("-" * 80)
Write-Host ("{0,-40} {1,10} {2,12} {3,12}" -f "Description", "Qty", "Unit Price", "Amount")
Write-Host ("-" * 80)
$total = 0
foreach ($item in $lineItems) {
Write-Host ("{0,-40} {1,10} {2,12:N2} {3,12:N2}" -f `
$item.description.Substring(0, [Math]::Min(40, $item.description.Length)),
$item.quantity,
$item.unitPrice,
$item.amount)
$total += $item.amount
}
Write-Host ("-" * 80)
Write-Host ("{0,-40} {1,10} {2,12} {3,12:N2}" -f "TOTAL", "", "", $total)
Python
def process_line_items(extraction):
"""Process and display line items."""
line_items = extraction.get('lineItems', [])
if not line_items:
print("No line items found")
return
print(f"\n{'Description':<40} {'Qty':>8} {'Price':>12} {'Amount':>12}")
print("-" * 75)
total = 0
for item in line_items:
desc = item.get('description', 'N/A')[:40]
qty = item.get('quantity', 0)
price = item.get('unitPrice', 0)
amount = item.get('amount', 0)
print(f"{desc:<40} {qty:>8} {price:>12.2f} {amount:>12.2f}")
total += amount
print("-" * 75)
print(f"{'TOTAL':<40} {'':>8} {'':>12} {total:>12.2f}")
return line_items
Validating Line Items
def validate_line_items(extraction):
"""Validate that line items sum to subtotal."""
line_items = extraction.get('lineItems', [])
subtotal = extraction.get('subtotal', 0)
calculated_total = sum(item.get('amount', 0) for item in line_items)
# Allow small rounding differences
tolerance = 0.01
if abs(calculated_total - subtotal) > tolerance:
print(f"Warning: Line items ({calculated_total:.2f}) don't match subtotal ({subtotal:.2f})")
return False
return True
Multi-Page Invoices
DocDigitizer handles multi-page invoices automatically, combining data from all pages into a single extraction result.
How It Works
- All pages of an invoice are analyzed together
- Header information is typically from page 1
- Line items are collected from all pages
- Totals are extracted from the summary section (often last page)
Identifying Page Numbers
The pages array in the response shows which pages contain the invoice:
{
"docType": "Invoice",
"pages": [1, 2, 3], // Invoice spans pages 1-3
"extraction": { ... }
}
Example: Processing Multi-Page Invoice
# Check if invoice spans multiple pages
$invoice = $response.Output[0]
$pageCount = $invoice.pages.Count
if ($pageCount -gt 1) {
Write-Host "Multi-page invoice detected: Pages $($invoice.pages -join ', ')"
}
# Line items from all pages are combined
$totalLineItems = $invoice.extraction.lineItems.Count
Write-Host "Total line items extracted: $totalLineItems"
Batch Processing
Process multiple invoices efficiently with batch processing.
PowerShell Batch Processing
function Process-InvoiceBatch {
param(
[string]$FolderPath,
[string]$ApiKey,
[string]$ContextId,
[string]$OutputCsv = "results.csv"
)
$pdfFiles = Get-ChildItem -Path $FolderPath -Filter "*.pdf"
$results = @()
foreach ($pdf in $pdfFiles) {
Write-Host "Processing: $($pdf.Name)..."
try {
$form = @{
files = Get-Item -Path $pdf.FullName
id = [guid]::NewGuid().ToString()
contextId = $ContextId
}
$response = Invoke-RestMethod -Uri "https://apix.docdigitizer.com/sync" `
-Method Post `
-Headers @{ "x-api-key" = $ApiKey } `
-Form $form
if ($response.StateText -eq "COMPLETED") {
$invoice = $response.Output | Where-Object { $_.docType -eq "Invoice" }
if ($invoice) {
$results += [PSCustomObject]@{
FileName = $pdf.Name
Status = "Success"
InvoiceNumber = $invoice.extraction.invoiceNumber
VendorName = $invoice.extraction.vendorName
InvoiceDate = $invoice.extraction.invoiceDate
TotalAmount = $invoice.extraction.totalAmount
Currency = $invoice.extraction.currency
TraceId = $response.TraceId
}
}
} else {
$results += [PSCustomObject]@{
FileName = $pdf.Name
Status = "Error"
InvoiceNumber = ""
VendorName = ""
InvoiceDate = ""
TotalAmount = ""
Currency = ""
TraceId = $response.TraceId
}
}
}
catch {
$results += [PSCustomObject]@{
FileName = $pdf.Name
Status = "Failed: $_"
InvoiceNumber = ""
VendorName = ""
InvoiceDate = ""
TotalAmount = ""
Currency = ""
TraceId = ""
}
}
# Respect rate limits
Start-Sleep -Milliseconds 500
}
# Export results
$results | Export-Csv -Path $OutputCsv -NoTypeInformation
Write-Host "`nProcessed $($results.Count) invoices. Results saved to $OutputCsv"
return $results
}
# Usage
Process-InvoiceBatch -FolderPath "C:\Invoices" -ApiKey "YOUR_KEY" -ContextId "YOUR_CONTEXT"
Python Batch Processing
import os
import csv
import uuid
import time
import requests
from pathlib import Path
def process_invoice_batch(folder_path, api_key, context_id, output_csv="results.csv"):
"""Process all PDF invoices in a folder."""
pdf_files = list(Path(folder_path).glob("*.pdf"))
results = []
for pdf_file in pdf_files:
print(f"Processing: {pdf_file.name}...")
try:
with open(pdf_file, "rb") as f:
response = requests.post(
"https://apix.docdigitizer.com/sync",
headers={"x-api-key": api_key},
files={"files": f},
data={
"id": str(uuid.uuid4()),
"contextId": context_id
},
timeout=300
)
result = response.json()
if result.get("StateText") == "COMPLETED":
invoice = next(
(doc for doc in result.get("Output", [])
if doc.get("docType") == "Invoice"),
None
)
if invoice:
ext = invoice.get("extraction", {})
results.append({
"filename": pdf_file.name,
"status": "Success",
"invoice_number": ext.get("invoiceNumber", ""),
"vendor_name": ext.get("vendorName", ""),
"invoice_date": ext.get("invoiceDate", ""),
"total_amount": ext.get("totalAmount", ""),
"currency": ext.get("currency", ""),
"trace_id": result.get("TraceId", "")
})
continue
results.append({
"filename": pdf_file.name,
"status": "Error",
"invoice_number": "",
"vendor_name": "",
"invoice_date": "",
"total_amount": "",
"currency": "",
"trace_id": result.get("TraceId", "")
})
except Exception as e:
results.append({
"filename": pdf_file.name,
"status": f"Failed: {e}",
"invoice_number": "",
"vendor_name": "",
"invoice_date": "",
"total_amount": "",
"currency": "",
"trace_id": ""
})
# Respect rate limits
time.sleep(0.5)
# Export to CSV
with open(output_csv, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=results[0].keys())
writer.writeheader()
writer.writerows(results)
print(f"\nProcessed {len(results)} invoices. Results saved to {output_csv}")
return results
# Usage
process_invoice_batch("/path/to/invoices", "YOUR_API_KEY", "YOUR_CONTEXT_ID")
Integration Examples
Export to Accounting Software
def export_to_accounting_format(extraction):
"""Convert extraction to common accounting import format."""
return {
"document_type": "INVOICE",
"document_number": extraction.get("invoiceNumber"),
"document_date": extraction.get("invoiceDate"),
"due_date": extraction.get("dueDate"),
"supplier": {
"name": extraction.get("vendorName"),
"tax_id": extraction.get("vendorTaxId"),
"address": extraction.get("vendorAddress")
},
"amounts": {
"net_amount": extraction.get("subtotal"),
"tax_amount": extraction.get("taxAmount"),
"gross_amount": extraction.get("totalAmount"),
"currency": extraction.get("currency")
},
"lines": [
{
"description": item.get("description"),
"quantity": item.get("quantity"),
"unit_price": item.get("unitPrice"),
"amount": item.get("amount"),
"tax_rate": item.get("taxRate")
}
for item in extraction.get("lineItems", [])
],
"payment": {
"terms": extraction.get("paymentTerms"),
"bank_account": extraction.get("bankAccount")
}
}
Database Integration
import sqlite3
def save_invoice_to_db(extraction, db_path="invoices.db"):
"""Save extracted invoice to SQLite database."""
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
# Create tables if not exist
cursor.execute('''
CREATE TABLE IF NOT EXISTS invoices (
id INTEGER PRIMARY KEY AUTOINCREMENT,
invoice_number TEXT,
invoice_date TEXT,
due_date TEXT,
vendor_name TEXT,
vendor_tax_id TEXT,
customer_name TEXT,
subtotal REAL,
tax_amount REAL,
total_amount REAL,
currency TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
cursor.execute('''
CREATE TABLE IF NOT EXISTS invoice_lines (
id INTEGER PRIMARY KEY AUTOINCREMENT,
invoice_id INTEGER,
description TEXT,
quantity REAL,
unit_price REAL,
amount REAL,
FOREIGN KEY (invoice_id) REFERENCES invoices (id)
)
''')
# Insert invoice
cursor.execute('''
INSERT INTO invoices (
invoice_number, invoice_date, due_date, vendor_name, vendor_tax_id,
customer_name, subtotal, tax_amount, total_amount, currency
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
''', (
extraction.get("invoiceNumber"),
extraction.get("invoiceDate"),
extraction.get("dueDate"),
extraction.get("vendorName"),
extraction.get("vendorTaxId"),
extraction.get("customerName"),
extraction.get("subtotal"),
extraction.get("taxAmount"),
extraction.get("totalAmount"),
extraction.get("currency")
))
invoice_id = cursor.lastrowid
# Insert line items
for item in extraction.get("lineItems", []):
cursor.execute('''
INSERT INTO invoice_lines (invoice_id, description, quantity, unit_price, amount)
VALUES (?, ?, ?, ?, ?)
''', (
invoice_id,
item.get("description"),
item.get("quantity"),
item.get("unitPrice"),
item.get("amount")
))
conn.commit()
conn.close()
return invoice_id
Best Practices
Document Quality
- Use minimum 150 DPI for scanned invoices (300 DPI recommended)
- Ensure text is legible and not blurry
- Avoid skewed or rotated documents
- Native PDFs (not scanned) provide best results
Data Validation
- Validate extracted totals against line item sums
- Check tax calculations are correct
- Verify vendor/customer tax IDs are in correct format
- Validate date formats before database insertion
Error Handling
- Always check
StateTextfor “COMPLETED” - Handle missing fields gracefully (use defaults)
- Log TraceId for troubleshooting
- Implement retry logic for transient failures
Performance
- Respect rate limits (use delays between requests)
- Process invoices in parallel where possible (within limits)
- Cache results for already-processed invoices
- Use batch processing for large volumes