This project is the implementation of MTL-TabNet (Multi-task Learning based Model for Image-based Table Recognition) based on the repository of TableMASTER-mmocr (Thank you very much for your excellent works).
The proposed model consists of one shared encoder, one shared decoder, and three separate decoders for three sub-tasks of the table recognition problem as shown in Fig. 1. The shared encoder encodes the input table image as a sequence of features. The sequence of features is passed to the shared decoder and then the structure decoder to predict a sequence of HTML tags that represent the structure of the table. When the structure decoder produces the HTML tag representing a new cell (‘’ or ‘<td ...’), the output of the shared decoder corresponding to that cell and the output of the shared encoder are passed into the cell-bbox decoder and the cell-content decoder to predict the bounding box coordinates and the text content of that cell. Finally, the text contents of cells are inserted into the HTML structure tags corresponding to their cells to produce the final HTML code of the input table image.
- Competition dataset PubTabNet, click here for downloading.
- About PubTabNet, check their github and paper.
- About the metric TEDS, see github
-
Install mmdetection. click here for details.
# We embed mmdetection-2.11.0 source code into this project. # You can cd and install it (recommend). cd ./mmdetection-2.11.0 pip install -v -e .
-
Install mmocr. click here for details.
# install mmocr cd {Path to TableMASTER_mmocr} pip install -v -e .
-
Install mmcv-full-1.3.4. click here for details.
pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html # install mmcv-full-1.3.4 with torch version 1.8.0 cuda_version 10.2 pip install mmcv-full==1.3.4 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html
Run data_preprocess.py to get valid train data. Remember to change the 'raw_img_root' and ‘save_root’ property of PubtabnetParser to your path.
python ./table_recognition/data_preprocess.py
It will about 8 hours to finish parsing 500777 train files. After finishing the train set parsing, change the property of 'split' folder in PubtabnetParser to 'val' and get formatted val data.
Directory structure of parsed train data is :
.
├── StructureLabelAddEmptyBbox_train
│ ├── PMC1064074_007_00.txt
│ ├── PMC1064076_003_00.txt
│ ├── PMC1064076_004_00.txt
│ └── ...
├── recognition_train_img
│ ├── 0
│ ├── PMC1064100_007_00_0.png
│ ├── PMC1064100_007_00_10.png
│ ├── ...
│ └── PMC1064100_007_00_108.png
│ ├── 1
│ ├── ...
│ └── 15
├── recognition_train_txt
│ ├── 0.txt
│ ├── 1.txt
│ ├── ...
│ └── 15.txt
├── structure_alphabet.txt
└── textline_recognition_alphabet.txt
Train multi-task learning based table recognition model with MTL-TabNet.
sh ./table_recognition/expr/table_recognition_dist_train.sh
To get final results.
python ./table_recognition/run_table_inference.py
run_table_inference.py will call table_inference.py and use multiple gpu devices to do model inference. Before running this script, you should change the value of cfg in table_inference.py .
Directory structure of table recognition results are:
# If you use 8 gpu devices to inference, you will get 8 detection results pickle files, one end2end_result pickle files and 8 structure recognition results pickle files.
.
├── structure_master_caches
│ ├── structure_master_results_0.pkl
│ ├── structure_master_results_1.pkl
│ ├── ...
│ └── structure_master_results_7.pkl
-
Installation.
pip install -r ./table_recognition/PubTabNet-master/src/requirements.txt
-
Get gtVal.json.
python ./table_recognition/get_val_gt.py
-
Calcutate TEDS score. Before run this script, modify pred file path and gt file path in mmocr_teds_acc_mp.py
python ./table_recognition/PubTabNet-master/src/mmocr_teds_acc_mp.py
TEDS score
Datasets | TEDS (%) | TEDS-struct. (%) |
---|---|---|
FinTabNet | - | 98.79 |
PubTabNet | 96.67 | 97.88 |
This project is licensed under the MIT License. See LICENSE for more details.
@article{visapp23namly,
title={An End-to-End Multi-Task Learning Model for Image-based Table Recognition},
author={Nam Tuan Ly and Atsuhiro Takasu},
booktitle={Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP},
year={2023},
pages={626-634},
publisher={SciTePress},
doi={10.5220/0011685000003417},
}