/t5-deep-ocr-extractor

Scanned receipts information extraction with Google Vision and T5 models

Primary LanguageJupyter NotebookMIT LicenseMIT

t5-deep-ocr-extractor

Introduction

This repository contains code to solve the Task 3 - Key Information Extraction from Scanned Receipts from the ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction. A detailed description of the challenge and tasks can be found at ICDAR 2019's official page. The task consists of extracting specific information (["address", "company", "total", "date"]) from scanned receipts images.

The core of our solution is to cast the problem into a text-to-text format, and then finetune T5 models to extract the fields by generating their contents. We use Google Vision OCR to extract textual data from the scanned images, and then use this data to feed the T5 models.

This work was done as the final project for the course "Projects on Deep Learning for Images and NLP" during the second semester of 2020, under the guidance of professors Rodrigo Nogueira and Roberto Lotufo.

All experiments are logged on Neptune and can be found at this link link. Some key libraries used in the project are:

  • pytorch-lightning: for reducing PyTorch boilerplate and configuring the training and evaluation loop
  • Hugging Face 🤗 transformers: for T5 models
  • gin-config: for experiment configuration
  • Neptune: for experiment tracking

How this repository is structured

.
|-- LICENSE
|-- README.md
|-- notebooks <<<<< Jupyter Notebooks
|   |-- draft <<<<< draft notebooks
|   |-- select_best_sroie_checkpoints_t5_ocr_baseline_initial_finetune.ipynb <<<<< notebook with initial selection of the best models
|   `-- sroie_t5_ocr_baseline_prepare_competition_submission.ipynb <<<<< notebook for creating files for competition submission
|-- setup.py
`-- src
    |-- __init__.py
    |-- data <<< codes that handle data
    |   |-- __init__.py
    |   |-- google_vision_ocr_extraction.py <<<< code for OCR extraction using Google Vision
    |   |-- google_vision_ocr_parsing.py <<< parsing of OCRs generated by Google Vision
    |   `-- sroie
    |       |-- __init__.py
    |       `-- t5_ocr_baseline.py <<<< Pytorch Dataset used in the models
    |
    |-- evaluation <<<< codes and scripts for model evaluation
    |   |-- __init__.py
    |   |-- save_experiment_predictions_t5_ocr_baseline.py <<<<< 
    |   |-- save_preds_t5_final_finetuning.sh
    |   |-- save_preds_t5_initial_finetuning.sh
    |   `-- sroie_eval_utils.py
    |-- metrics.py <<<< metrics used
    |-- models <<< model codes
    |   |-- __init__.py
    |   |-- gin <<<< gin files with configurations of all conducted experiments
    |   |   |-- README.md
    |   |   |-- best_t5_models_defaults.gin <<< default gin config for final model training (trained on all labeled data with the best hyperparameter combinations)
    |   |   |-- defaults.gin <<<< default gin config for initial finetune experiments
    |   |   |-- generate_t5_default_finetune_gin_configs.py <<<<< script that generates the gin config files for the initial finetune experiments
    |   |   |-- t5_best_models_finetune <<<< gin configs of the final models (trained on all labeled data with the best hyperparameter combinations)
    |   |   |-- t5_default_finetune <<<<< gin config files for the initial finetune experiments
    |   |-- gin_configurables.py <<<< extension of classes for gin configurables
    |   |-- gin_trainer_t5_ocr_baseline.py <<<< main model training script
    |   |-- past_scripts <<<< old scripts
    |   |-- t5_ocr_baseline.py <<<< Pytorch Lightning model module code
    |   `-- utils.py 
    `-- utils.py