TNDR: Table Net Detection and Classification Dataset
Abdelrahman Abdallah, Alexander Berendeev, Islam Nuradin, Daniyar Nurseitov,
We present TNCR, a new table dataset with varying image quality collected from open access websites. TNCR dataset can be used for table detection in scanned document images and their classification into 5 different classes.
TNCR contains 9428 labeled tables with approximately 6621 images . In this paper, we have implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines. Deformable DERT with Resnet-50 Backbone Network achieves the highest performance compared to other methods with a precision of 86.7%, recall of 89.6%, and f1 score of 88.1% on the TNCR dataset. We have made TNCR open source in the hope of encouraging more deep learning approaches to table detection, classification and structure recognition.
TNCR has been implemented and tested with Python 3.7 and PyTorch 1.8.1.
%cd $project_dir$
!pip install -q mmcv terminaltables
!git clone 'https://github.com/open-mmlab/mmdetection.git'
!pip install -r "$project_dir$/mmdetection/requirements/optional.txt"
%cd mmdetection/
!python setup.py install
!python setup.py develop
!pip install -r {"$project_dir$/mmdetection/requirements.txt"}
!pip install pillow
!pip install mmcv
!pip install mmcv-full
%cd ..
!pip uninstall pycocotools
!pip uninstall mmpycocotools
!pip install mmpycocotools
Python: 3.7
PyTorch: 1.8.1
OpenCV: 4.5.2
MMCV: 1.3.5
MMDetection: v2.10.0
You can download the dataset through this link or from Google Drive divide by 5 parts
Full Lined Merged Cells No lines Partial Lined Partial Lined Merged CellsAll config and checkpoint files available in this link
Checkout our demo notebook for loading checkpoints and performing inference
Backbones | Config Files | Checkpoint File |
---|---|---|
Resnet-50_1x | Config Files | Checkpoint |
Resnet-50_20e | Config Files | Checkpoint |
Resnet-101_1x | Config Files | Checkpoint |
Resnet-101_20e | Config Files | Checkpoint |
ResNeXt-101-32x4d_1x | Config Files | Checkpoint |
ResNeXt-101-64x4d_1x | Config Files | Checkpoint |
Backbones | Config Files | Checkpoint File |
---|---|---|
Resnet-50_1x | Config Files | Checkpoint |
Resnet-50_20e | Config Files | Checkpoint |
Resnet-101_1x | Config Files | Checkpoint |
Resnet-101_20e | Config Files | Checkpoint |
ResNeXt-101-32x4d_1x | Config Files | Checkpoint |
ResNeXt-101-64x4d_1x | Config Files | Checkpoint |
Method | Backbones | Config Files | Checkpoint File |
---|---|---|---|
Fast R-CNN | Resnet-50_1x | Config Files | Checkpoint |
CRPN | Resnet-50_1x | Config Files | Checkpoint |
Backbones | Config Files | Checkpoint File |
---|---|---|
Resnet-50_1x | Config Files | Checkpoint |
Resnet-50_20e | Config Files | Checkpoint |
Resnet-101_1x | Config Files | Checkpoint |
Backbones | Config Files | Checkpoint File |
---|---|---|
DarkNet-53_320 | Config Files | Checkpoint |
DarkNet-53_416 | Config Files | Checkpoint |
DarkNet-53_608 | Config Files | Checkpoint |
Backbones | Config Files | Checkpoint File |
---|---|---|
R-50_1 | Config Files | Checkpoint |
The code of TNCR is Open Source under the MIT License. There is no limitation for both acadmic and commercial usage.
If you find this work useful for your research, please cite our paper:
@misc{abdallah2021tncr,
title={TNCR: Table Net Detection and Classification Dataset},
author={Abdelrahman Abdallah and Alexander Berendeyev and Islam Nuradin and Daniyar Nurseitov},
year={2021},
eprint={2106.15322},
archivePrefix={arXiv},
primaryClass={cs.CV}
}