Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images.
Paper: TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images
TableNet is a modern deep learning architecture that was proposed by a team from TCS Research year in the year 2019. The main motivation was to extract information from scanned tables through mobile phones or cameras.
They proposed a solution that includes accurate detection of the tabular region within an image and subsequently detecting and extracting information from the rows and columns of the detected table.
Architecture: The architecture is based out of Long et al., an encoder-decoder model for semantic segmentation. The same encoder/decoder network is used as the FCN architecture for table extraction. The images are preprocessed and modified using the Tesseract OCR.
Source: Nanonets
pip install -r requirements.txt
- Download the Marmot Dataset from the link given in readme.
- Run
data_preprocess/generate_mask.py
to generate Table and Column Mask of corresponding images. - Follow the
TableNet.ipynb
notebook to train and test the model.
- Require a very decent System with a good GPU for accurate result on High pixel images.
Download the dataset provided in paper : Marmot Dataset.