/textract-demo

Enhancing Amazon Textract with pre- and post-processing

Primary LanguagePython

End-to-End Smart OCR

Amazon Textract's advanced extraction features go beyond simple OCR to recover structure from documents: Including tables, key-value pairs (like on forms), and other tricky use-cases like multi-column text.

However, many practical applications need to combine this technology with use-case-specific logic - such as:

  • Pre-checking that submitted images are high-quality and of the expected document type
  • Post-processing structured text results into business-process-level fields (e.g. in one domain "Amount", "Total Amount" and "Amount Payable" may be different raw annotations for the same thing; whereas in another the differences might be important!)
  • Human review and re-training flows

This solution demonstrates how Textract can be integrated with:

...on a simple example use-case: extracting vendor, date, and total amount from receipt images.

The design is modular, to show how this pre- and post-processing can be easily customized for different applications.

Solution Architecture Overview

This overview diagram is not an exhaustive list of AWS services used in the solution.

Smart OCR Architecture Diagram

The solution orchestrates the core OCR pipeline with AWS Step Functions - rather than direct point-to-point integrations - which gives us a customizable, graphically-visualizable flow (defined in /source/StateMachine.asl.json):

AWS Step Functions Screenshot

The client application and associated services are built and deployed as an AWS Amplify app, which simplifies setup of standard client-cloud integration patterns (e.g. user sign-up/login, authenticated S3 data upload).

Rather than have our web client poll the state machine for progress updates, we push messages via Amplify PubSub - powered by AWS IoT Core.

The Amplify build settings (in amplify.yml with some help from the Makefile) define how both the Amplify-native and custom stack components are built and deployed... Leaving us with the folder structure you see in this repository:

├── amplify                   [Auto-generated, Amplify-native service config]
├── source
│   ├── ocr                       [Custom, non-Amplify backend service stack]
│   │   ├── human-review              [Human review integration with Amazon A2I]
│   │   ├── postprocessing            [Extract business-level fields from Textract output]
│   │   ├── preprocessing             [Image pre-check/cleanup logic]
│   │   ├── textract-integration      [SFn-Textract integrations]
│   │   ├── ui-notifications          [SFn-IoT push notifications components]
│   │   ├── StateMachine.asl.json     [Processing flow definition]
│   │   └── template.sam.yml          [AWS SAM template for non-Amplify components]
│   └── webui                     [Front-end app (VueJS, BootstrapVue, Amplify)]
├── amplify.yml               [Overall solution build steps]
└── Makefile                  [Detailed build commands, to simplify amplify.yml]
NOTE For details on each component, check the READMEs in their subfolders!

Deploying the Solution

If you have:

...then you can go ahead and click the button below, which will fork the repository and deploy the base solution stack(s):

One-click deployment

From here, there are just a few extra (but not trivial) manual configuration steps required to complete your setup:

Now you should be all set to upload images through the app UI, review low-confidence results through the Amazon A2I UI, and see the results!

The App in Action

"Successful extraction with review screenshot"