Subex Hackathon

An end to end Deep Learning approach for table detection and structure recognition from invoice documents

Results: 1st Place out of 150+ participants

1. Introduction

Finding Tables is an automatic table recognition method for interpretation of tabular data in document images. We present an improved deep learning-based end to end approach for solving both problems of table detection and structure recognition using a finetuned model on our invoice data which is already pretrained on TableBank. Finding Tables is a Faster RCNN High-Resolution Network that detects the regions of tables. For our structure recognition we propose an entirely novel approach leveraging the SOTA methods in NLP. We use layoutLM a BERT based model to process the text in the image and map them as question answers pairs, so that we can then transform it into json files.

2. Setting it all up

Setting up LayoutLM:

git clone -b remove_torch_save https://github.com/NielsRogge/unilm.git cd unilm/layoutlm pip install unilm/layoutlm git clone https://github.com/huggingface/transformers.git cd transformers pip install ./transformers

Code is developed under following library dependencies

Torch==1.7.0+cu101
Torchvision==0.8.1+cu101
Cuda = 10.1

To install Detectron V2

pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html

Installing other dependencies

pip install -r requirements.txt

Please download weights for Detectron V2 and LayoutLM and keep it data folder

Detectronv2 weights: Detectron_finetuned_model_weights

LayoutLM LayoutLM weights

3. Running inference

To test custom images on our model, go inside the folder and run the command "python run_inference.py 00017.PNG (path of image file)"

4. Examples

Original Image:

Detecting Images:

Example from Structure Recognition:

If you are having troubles getting it to work, please feel free to contact me or raise an issue