06 Jan Why Cognitive Data Capture Is Not There Yet
One of the first key questions we receive in the 60 seconds after meeting a potential customer is:
“What makes DocDigitizer different from any other cognitive data capture or OCR vendor?”
The question is truly significant. I’ll dive directly into the answer and then explore why, at
DocDigitizer, we are taking a radically different approach than anyone else.
What makes DocDigitizer different? You send DocDigitizer a document, and you receive curated
actionable structured data. Period. There’s no need for extra revision work.
Let me try to explain the problem. Consider I deliver a box with 100 balls to you. Some are blue,
some are red. Imagine there is a system that prints a label stating the ball color and sticks it to
each ball. But the system is not perfect and 5% of the time it prints the wrong color, i.e. puts a
blue sticker on a red ball or vice-versa.
Now the critical question: “By looking at the sticker on each ball, which balls have the wrong
sticker?” And the answer is… you simply don’t know. You will have to check each and every ball.
DocDigitizer on the other hand delivers you accurate, verified content that can be used without
additional validation. That is our core philosophy, and it makes all the difference.
That is the problem with competitive solutions out there. If you are a bank, insurer, government,
or small accounting firm, you can’t afford to transfer the wrong amount to the wrong account
number even if it only happens 5% of the time. If you are running a KYC (know your customer) due
diligence process, you don’t want to check the wrong data or the wrong person.
AI/ML and Humans
At DocDigitizer we clearly understood the problem. No matter what technology our customers use
(cloud OCR, cognitive extraction, etc) they end up paying the revision services to verify all of the
content again. In some cases, the cost is about the same. What we found out is that the rate of
human error in the revision process can be as high as 10%.
But in 2020, machines still are not capable of being fully autonomous in verifying documents.
Think of the current status quo for autonomous driving. We kind of trust the autopilot in very
limited circumstances, but only with a pilot able to take control over the vehicle if necessary.
To solve this conundrum, DocDigitizer takes advantage of the huge potential of artificial
intelligence and machine learning and also provides fully certified and accurate content. We
realized that human intervention is still needed. The conventional approach is to split the process
into two parts. First, you send the documents to an OCR/cognitive extraction system. Second, you
perform the revision.
DocDigitizer’s approach is radically different. Although there are still two clear distinctive
components—the machine and the humans—the process is tightly coupled. I could say that we
built a complex event processor engine with a real-time machine learning process but then I might
lose you. Simply put, the human revision process occurs in real-time and feeds the learning
process in real-time. It’s like having a personal coach who tells the computer what it’s doing right
To build such a system we had to rethink human verification of content and data. Along the
process, we learned a few things. As important as having AI/ML algorithms that extract the
information, it is as important to have AI/ML algorithms that understand humans.
At DocDigitizer we know when an editor is starting the get tired, make mistakes, or if he/she needs
to take a break. We understand which type of content needs more verification and why. And this
is tailored and customized for each specific editor. It is actually fascinating as we enter the field of
psychology. We also had to add gamification concepts to help editors remain engaged and focused
Getting back to that first and most important question: What makes DocDigitizer different? You
send DocDigitizer a document, you receive curated, actionable structured data without the need
for extra revision work.
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
DocDigitizer is a Portuguese startup founded as a spin-off of Infosistema in 2016. It is dedicated to the development of intelligent process automation technologies through the use of machine learning.
Since its founding, DocDigitizer has recorded annual growth rates in excess of 100% and is now supporting leading companies in the banking, insurance and financial services industries in more than 6 geographies.
Visit the website to learn more: www.docdigitizer.com