Why Cognitive Data Capture Is Not There Yet

One of the first key questions we receive in the 60 seconds after meeting a potential customer is: “What makes DocDigitizer different from any other cognitive data capture or OCR vendor?”

The question is truly significant. I’ll dive directly into the answer and then explore why, at DocDigitizer, we are taking a radically different approach than anyone else.

What makes DocDigitizer different? You send DocDigitizer a document, and you receive curated actionable structured data. Period. There’s no need for extra revision work.

Let me try to explain the problem. Consider I deliver a box with 100 balls to you. Some are blue, some are red. Imagine there is a system that prints a label stating the ball color and sticks it to each ball. But the system is not perfect and 5% of the time it prints the wrong color, i.e. puts a blue sticker on a red ball or vice-versa.

Now the critical question: “By looking at the sticker on each ball, which balls have the wrong sticker?” And the answer is… you simply don’t know. You will have to check each and every ball.

DocDigitizer on the other hand delivers you accurate, verified content that can be used without additional validation. That is our core philosophy, and it makes all the difference.

That is the problem with competitive solutions out there. If you are a bank, insurer, government, or small accounting firm, you can’t afford to transfer the wrong amount to the wrong account number even if it only happens 5% of the time. If you are running a KYC (know your customer) due diligence process, you don’t want to check the wrong data or the wrong person.

AI/ML and Humans

At DocDigitizer we clearly understood the problem. No matter what technology our customers use (cloud OCR, cognitive extraction, etc) they end up paying the revision services to verify all of the content again. In some cases, the cost is about the same. What we found out is that the rate of human error in the revision process can be as high as 10%.

But in 2020, machines still are not capable of being fully autonomous in verifying documents. Think of the current status quo for autonomous driving. We kind of trust the autopilot in very limited circumstances, but only with a pilot able to take control over the vehicle if necessary.

To solve this conundrum, DocDigitizer takes advantage of the huge potential of artificial intelligence and machine learning and also provides fully certified and accurate content. We realized that human intervention is still needed. The conventional approach is to split the process into two parts. First, you send the documents to an OCR/cognitive extraction system. Second, you perform the revision.

DocDigitizer’s approach is radically different. Although there are still two clear distinctive components—the machine and the humans—the process is tightly coupled. I could say that we built a complex event processor engine with a real-time machine learning process but then I might lose you. Simply put, the human revision process occurs in real-time and feeds the learning process in real-time. It’s like having a personal coach who tells the computer what it’s doing right or wrong.

To build such a system we had to rethink human verification of content and data. Along the process, we learned a few things. As important as having AI/ML algorithms that extract the information, it is as important to have AI/ML algorithms that understand humans. 

At DocDigitizer we know when an editor is starting the get tired, make mistakes, or if he/she needs to take a break. We understand which type of content needs more verification and why. And this is tailored and customized for each specific editor. It is actually fascinating as we enter the field of psychology. We also had to add gamification concepts to help editors remain engaged and focused on revision.

Getting back to that first and most important question: What makes DocDigitizer different? You send DocDigitizer a document, you receive curated, actionable structured data without the need for extra revision work.

DocDigitizer is a Portuguese startup founded as a spin-off of Infosistema in 2016. It is dedicated to the development of intelligent process automation technologies through the use of machine learning.

Since its founding, DocDigitizer has recorded annual growth rates in excess of 100% and is now supporting leading companies in the banking, insurance and financial services industries in more than 6 geographies.

