/docai-expense-parser-demo

Primary LanguagePythonApache License 2.0Apache-2.0

DocAI Expense Parser Demo

Objective

Learn how to use Google Cloud Platform to construct a pipeline to process expenses (ie. receipts). This repo serves as a sample code to build your own demo but is not tested for production.

Visualizing the workflow

GCP Workflow

GCP Services used in the Demo

Steps to re-create this demo in your own GCP environment

  1. Create a Google Cloud Platform Project

  2. Enable the Cloud Document AI API, Cloud functions API, and Cloud Build API in the project you created in step #1

  3. If you do not have access to the parser, request access via this link. Here is a link to the official Expense Parser documentation.

  4. Create a service account that will later be used by Cloud Functions

    1. Navigate to IAM & Admin -> Service Accounts
    2. Click on Create a service account
    3. In the Service account name section, type in process-receipt-example or a name of your choice
    4. Click Create and continue
    5. Grant this service account the following roles:
      • Storage Admin
      • BigQuery Admin
      • Document AI API User
    6. Click Done and you should see this service account in the IAM main page Service account on IAM main page
  5. Create your Doc AI processor

    • At this point, you should have your request in Step 3 approved and have access to expense parser
    • Navigate to console -> Document AI -> processors
    • Click Create processor and choose expense parser
    • Name your processor and click Create
    • Take note of your processor's region (eg. us) and processor ID
  6. Activate your Command Shell and clone this GitHub Repo in your Command shell using the command:

gh repo clone jiya-zhang/docai-expense-parser-demo
git checkout -b v1api
  1. Execute Bash shell scripts in your Cloud Shell terminal to create cloud resources (i.e Google Cloud Storage Buckets, Pub/Sub topics, Cloud Functions, BigQuery dataset and table)

    1. Change directory to the scripts folder

      cd docai-expense-parser-demo
      
    2. Update the following values in .env.local:

      • PROJECT_ID should match your current project's ID
      • BUCKET_LOCATION is where you want the raw receipts to be stored
      • CLOUD_FUNCTION_LOCATION is where your code executes
      • CLOUD_FUNCTION_SERVICE_ACCOUNT should be the same name you created in Step 4
      vim .env.local
      
    3. Make your .sh files executable

      chmod +x set-up-pipeline.sh
      
    4. Change directory to the cloud functions folder

      cd cloud-functions
      
    5. Update the following values in .env.yaml (from your note in Step 5):

      • PARSER_LOCATION
      • PROCESSOR_ID
      vim .env.yaml
      
    6. Go back to the original folder and execute your .sh files to create cloud resources

      cd ..
      ./set-up-pipeline.sh
      
  2. Testing/Validating the demo

    1. Upload a sample receipt in the input bucket (<project_id>-input-receipts)
    2. At the end of the processing, you should expect your BigQuery tables to be populated with extracted entities (eg. total_amount, supplier_name, etc.)
    3. With the structured data in BigQuery, we can now design downstream analytical tools to gain actionable insights as well as detect errors/frauds.