Jump to content

AWS Textract brings intelligence to OCR


NelsonG

Recommended Posts

One of the challenges just about every business faces is converting forms to a useful digital format. This has typically involved using human data entry clerks to enter the data into the computer. State of the art involved using OCR to read forms automatically, but AWS CEO Andy Jassy explained that OCR is basically just a dumb text reader. It doesn’t recognize text types. Amazon wanted to change that and today it announced Textract, an intelligent OCR tool to move data from forms to a more useable digital format.

In an example, he showed a form with tables. Regular OCR didn’t recognize the table and interpreted it as a string of text. Textract is designed to recognize common page elements like a table and pull the data in a sensible way.

Jassy said that forms also often change and if you are using a template as a work-around for OCR’s lack of intelligence, the template breaks if you move anything. To fix that, Textract is smart enough to understand common data types like social security numbers, dates of birth and addresses and it interprets them correctly no matter where they fall on the page.

“We have taught Textract to recognize this set of characters is a date of birth and this is a social security number. If forms change Textract won’t miss it,” Jassy explained

more AWS re:Invent 2018 coverage

Techcrunch?d=2mJPEYqXBVI Techcrunch?d=7Q72WNTAKBA Techcrunch?d=yIl2AUoC8zA Techcrunch?i=rHgyavWixzw:Lqgnm7BYd_o:-BT Techcrunch?i=rHgyavWixzw:Lqgnm7BYd_o:D7D Techcrunch?d=qj6IDK7rITs
rHgyavWixzw

View the full article

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Our picks

    • Wait, Burning Man is going online-only? What does that even look like?
      You could have been forgiven for missing the announcement that actual physical Burning Man has been canceled for this year, if not next. Firstly, the nonprofit Burning Man organization, known affectionately to insiders as the Borg, posted it after 5 p.m. PT Friday. That, even in the COVID-19 era, is the traditional time to push out news when you don't want much media attention. 
      But secondly, you may have missed its cancellation because the Borg is being careful not to use the C-word. The announcement was neutrally titled "The Burning Man Multiverse in 2020." Even as it offers refunds to early ticket buyers, considers layoffs and other belt-tightening measures, and can't even commit to a physical event in 2021, the Borg is making lemonade by focusing on an online-only version of Black Rock City this coming August.    Read more...
      More about Burning Man, Tech, Web Culture, and Live EventsView the full article
      • 0 replies
    • Post in What Are You Listening To?
      Post in What Are You Listening To?
    • Post in What Are You Listening To?
      Post in What Are You Listening To?
    • Post in What Are You Listening To?
      Post in What Are You Listening To?
    • Post in What Are You Listening To?
      Post in What Are You Listening To?
×
×
  • Create New...