The OCR Is Dead, Long Live Textract!

Explore in:

(3-min reading)

Amazon Textract is receiving significant hype in the OCR landscape. Since its launch in November 2018, countless articles and press releases have been published discussing the Textract’s potential to revolutionize the OCR industry.

What Has Been The Real Impact Of Amazon Textract On OCR Technologies?

Now that the dust has begun to settle down, it’s time to look back and understand to what extent this potential has been realized and what the real impacts of Amazon Textract were on incumbent OCR technologies.

What Is Amazon Textract?

According to Amazon, Textract is a service designed to extract text and data from virtually any document.

Textract combines proprietary OCR technologies with machine learning, taking advantage of the same deep learning technology that is currently used by Amazon’s computer vision scientists to analyze billions of images and videos daily. This approach ensures continuous learning through new data and allows Amazon to deploy new features continuously.

On the business side, Amazon Textract has a very competitive price that starts at .00015 cents per page for basic OCR or $1.50 for 1,000 pages.

Amazon Textract is Amazon’s attempt to move to the uncharted waters of OCR 2.0 and establish itself as one of the pioneers in this space.

Is Textract A Game Changer?

Leaving behind all the buzz and marketing cliches, the reality is that Textract seems a bit dated based on what we’ve seen in recent years, considering both its breadth of features and use cases.

Amazon fails to capture the capabilities of modern commercial OCR technologies since many of the challenges they point out as limitations in these solutions have been solved for quite some years.

It’s also worth mentioning that, based on what Amazon presented in their launching webinar, Textract is quite behind in their detection, understanding, and processing of table data. Despite that fact, this was presented as a unique selling point of Textract.

Despite these limitations, we view Textract as a promising opportunity to push forward OCR technologies and, most of all, change and educate the market on new trends that are not being addressed by incumbent technologies.

{“align":"center","fontSize":"medium"} –>

Textract is not closing any doors to OCR solutions. It is, in fact, laying the groundwork for the development of new and improved data capture solutions.

Textract’s competitive edge against low-level OCR providers will be in using Amazon’s scale and access to data to pressure them on price.

Amazon is not alone in this game; Google Vision and Azure Cognitive Services are offering roughly the same features with a similar market approach.

What Is DocDigitizer?

DocDigitizer is a Data Capture service that leverages both proprietary and third-party technologies.

Our value proposition goes way beyond offering an image to text recognition or providing unvalidated generic data structures.

As digital document inbounds grow in quantity and diversity, automation must have the ability to understand semantic information rather than following hard-coded rules or configurations.

There was a time where automation meant processing scanned structured documents in back-office operations.

Nowadays, businesses are not able to impose strict formats on the way they communicate with their customers. Digital transformation simultaneously offers greater diversity, more communication channels, and faster response times.

Robot Process Automation (RPA) is also shaping the automation landscape with a tremendous impact on data entry related processes. Smart, scalable, and precise data capture solutions are key to supporting both Digital Transformation and RPA initiatives.

At DocDigitizer, we are democratizing the access to cognitive data capture, providing the following benefits over more traditional approaches:

We process any document (structured or unstructured) independent of its language, layout or domain.
We don’t have any setup costs.
We provide fully validated data with 99% accuracy guaranteed.
We leverage state of the art proprietary Machine Learning and Natural Language Processing technologies.
We combine the speed and scale of machine data capture with the assurance that it can come only from our crowdsourcing of data curators.

All this is included in the form of an API as a service, providing the automation enabler that you need to support any digital transformation or RPA (robot process automation) initiative.

Start now your digital transformation!

Get Started

Book a Demo

Watch a Demo

Name	Provider	Finality	Validity	Type
wordpress_{hash}	Wordpress	WordPress uses the login wordpress_{hash} cookie to store authentication details. Its use is limited to the Administration Screen area, /wp-admin/	session	Core
wordpress_logged_in_{hash}	Wordpress	Remember User session. WordPress sets the after login wordpress_logged_in_{hash} cookie, which indicates when you’re logged in, and who you are, for most interface use.	session	Core
wp-settings-{user_id}	Wordpress	Customization cookie. Used to persist a user’s wp-admin configuration. The ID is the user’s ID. This is used to customize the view of admin interface, and possibly also the main site interface.	1 year	Core
cookielawinfo-checkbox-functional	Cookie/GDPR	This cookie stores if a visitor has accepted "functional" cookies.	choose	Legal
cookielawinfo-checkbox-performance	Cookie/GDPR	This cookie stores if a visitor has accepted "performance" cookies.	choose	Legal
viewed_cookie_policy	Cookie/GDPR	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not the user has consented to the use of cookies. It does not store any personal data.	choose	Legal

Name	Provider	Finality	Validity	Type
wp-wpml_current_language	WPML	Stores the current language. This cookie is enabled by default on sites that use the Language filtering for AJAX operations feature.	session	Multilanguage
wp-wpml_current_admin_language_{hash}	WPML	Stores the current WordPress administration area language.	session	Multilanguage
icl_visitor_lang_js	WPML	Stores the redirected language. This cookie is enabled for all site visitors if you use the Browser language redirect feature.	session	Multilanguage

Name	Provider	Finality	Validity	Type
_gcl_au	Google	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.	3 months	Analytics
_ga	Google	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomlygenerated number to recognize unique visitors.	2 years	Analytics
_gid	Google	installedby Google Analytics, _gid cookie stores information on how visitors usea website, while also creating an analytics report of the website'sperformance. Some of the data that are collected include the number ofvisitors, their source, and the pages they visit anonymously.	1 day	Analytics
_gat_UA-108095224-1	Google	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.	1 minute	Analytics
_hjTLDTest	Hotjar	To determine the most generic cookie path that has to be used instead of the page hostname, Hotjar sets the _hjTLDTest cookie to store different URL substring alternatives until it fails.	session	Analytics
_hjFirstSeen	Hotjar	Hotjar sets this cookie to identify a new user’s first session. It stores a true/false value, indicating whether it was the first time Hotjar saw this user.	30 minutes	Analytics
_hjAbsoluteSessionInProgress	Hotjar	Hotjar sets this cookie to detect the first pageview session of a user. This is a True/False flag set by the cookie.	30 minutes	Analytics

Name	Provider	Finality	Validity	Type
_fbp	Facebook	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.	3 months	Advertisement
test_cookie	.doubleclick.net	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.	15 minutes	Advertisement
m	m.stripe.com	Accept payments and move money globally with Stripe’s powerful APIs and software solutions designed to help you capture more revenue.	2 years	Payment

PowerCapture

Document classifier

WorldObjects

By Industry

By Use Case

Services

Success Stories

Partner Program

Find a Partner

On-Demand Content

Events

Report

Videos

Documentation