Changelog
New features, improvements, and fixes — shipped regularly.
MCP Server alpha for Claude Desktop
Install the DocDigitizer MCP Server in Claude Desktop and extract documents conversationally. Pass a file path or URL — get structured JSON back in the chat.
Schema auto-detection improvements
15% better field coverage on invoices and receipts. The extraction model now recognizes line-item discounts, multi-currency totals, and payment reference codes.
40% faster multi-page processing
Documents with 5+ pages now process 40% faster thanks to parallelized page-level extraction. Average processing time for a 20-page contract dropped from 12s to 7s.
Python SDK 2.1.0
New batch processing method: client.extract_batch(files, on_progress=callback). Also: better error messages, retry logic for transient failures, and type stubs for IDE autocomplete.
Custom schema extraction
Pass a JSON Schema to the /extract endpoint and get back exactly the fields you define. Useful when you know your output format and want deterministic structure.
Multi-document detection improvements
Packets containing mixed document types (e.g. invoice + receipt + ID) are now separated and classified individually. Each sub-document returns its own extraction result.
CLI tool 2× faster
Cold start time reduced from 2.4s to 0.8s. Parallel workers for batch folder processing. New --output csv flag for direct CSV export.
Node.js SDK launch
npm install docdigitizer — TypeScript-first, promise-based, works with Express, Next.js, and serverless. Full parity with the Python SDK.