A Dataset for Analysing Complex Document Layouts in the Digital Humanities and its Evaluation with Krippendorff ’s Alpha
The code was branched from https://github.com/hikopensource/DAVAR-Lab-OCR and only necessary parts to run the code on the TexBiG Dataset have been included.
Nvidia Tesla V100-PCIE with 16GB VRAM and Ubuntu 18.04.
conda create -n vsr_lean python=3.7
conda activate vsr_lean
conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 cudatoolkit=10.2 -c pytorch
pip install openmim
mim install mmcv-full=1.4.0
mim install mmdet=2.11.0
python setup.py install
The dataset can be found under:
https://doi.org/10.5281/zenodo.6885144
Since the models requires additonal text from the document pages some OCR data are available (generated with Tesseract). These data are far from perfect, but they allow training of the model.
The final model weights are included here:
https://doi.org/10.5281/zenodo.6885033
Follow the description here to prepare the data on your own and train the models from scratch: https://github.com/hikopensource/DAVAR-Lab-OCR/tree/2c518763134331cdaeb46d12aaec61fa3b0e6a02/demo/text_layout/VSR
Add the bert-base-uncased model to /demo/text_layout/common/
Add the mask_rcnn model to /demo/text_layout/common/
For running a test directly on the dataset download the pretrained model from URL-here and store it as demo/text_layout/VSR/TexBiG/log/test_texbig/publaynet_pretrained/eta/Best_checkpoint.pth
.
In order to run the code you will need to adapt the paths in /demo/text_layout/VSR/TexBiG/configs/eta.py
(for example, to adapt the eta configuration).
You should link to the dataset, pre-trained model and the BERT encoder.
tools/train.py demo/text_layout/VSR/TexBiG/configs/eta.py
tools/test.py demo/text_layout/VSR/TexBiG/configs/eta.py demo/text_layout/VSR/TexBiG/log/test_texbig/publaynet_pretrained/eta/Best_checkpoint.pth --eval bbox segm
export NGPU=2
for running on two GPU's
@inproceedings{10.1007/978-3-031-16788-1_22,
title={A Dataset for Analysing Complex Document Layouts in the Digital Humanities and its Evaluation with Krippendorff ’s Alpha},
author={Tschirschwitz, David and Klemstein, Franziska and Stein, Benno and Rodehorst, Volker},
booktitle="Pattern Recognition",
year={2022},
publisher="Springer International Publishing",
address="Cham",
pages="354--374",
}