DocXclassifier: towards a robust and interpretable deep neural network for document image classification

This repository contains the evaluation code for the paper DocXclassifier: towards a robust and interpretable deep neural network for document image classification by Saifullah Saifullah, Stefan Agne, Andreas Dengel, and Sheraz Ahmed.

Requires Python 3+. For evaluation, please follow the steps below.

Environment Setup

Clone the repository

git clone https://github.com/saifullah3396/docxclassifier.git --recursive

Install requirements

Install the dependencies:

pip install -r requirements.txt

Setup environment variables:

export PYTHONPATH=./external/torchfusion/src
export DATA_ROOT_DIR=/home/ataraxia/Datasets/
export TORCH_FUSION_CACHE_DIR=</your/cache/dir>
export TORCH_FUSION_OUTPUT_DIR=</your/output/dir> # can be any directory where datasets are cached and model training outputs are generated.

DocXClassifier Models

Model	Dataset	Accuracy
DocXClassifier-B	RVL-CDIP	94.00%
DocXClassifier-L	RVL-CDIP	94.15%
DocXClassifier-XL	RVL-CDIP	94.17%
DocXClassifier-B	Tobacco3482 (RVL-CDIP Pretraining)	95.29%
DocXClassifier-L	Tobacco3482 (RVL-CDIP Pretraining)	95.57%
DocXClassifier-XL	Tobacco3482 (RVL-CDIP Pretraining)	95.43%
DocXClassifier-B	Tobacco3482 (ImageNet Pretraining)	87.43%
DocXClassifier-L	Tobacco3482 (ImageNet Pretraining)	88.43%
DocXClassifier-XL	Tobacco3482 (ImageNet Pretraining)	90.14%

DocXClassifierFPN Models

Model	Dataset	Accuracy
DocXClassifierFPN-B	RVL-CDIP	94.04%
DocXClassifierFPN-L	RVL-CDIP	94.13%
DocXClassifierFPN-XL	RVL-CDIP	94.19%
DocXClassifierFPN-B	Tobacco3482 (RVL-CDIP Pretraining)	95.57%
DocXClassifierFPN-L	Tobacco3482 (RVL-CDIP Pretraining)	95.71%
DocXClassifierFPN-XL	Tobacco3482 (RVL-CDIP Pretraining)	94.86%
DocXClassifierFPN-B	Tobacco3482 (ImageNet Pretraining)	88.43%
DocXClassifierFPN-L	Tobacco3482 (ImageNet Pretraining)	89.57%
DocXClassifierFPN-XL	Tobacco3482 (ImageNet Pretraining)	90.29%

Evaluation on RVL-CDIP:

Please download the RVL-CDIP dataset and place it under the directory $DATA_ROOT_DIR/documents/rvlcdip. Evaluate the DocXClassifier models on the RVL-CDIP dataset using the following script:

./scripts/run/evaluate/document_classification/evaluate_rvlcdip_no_fpn.sh

Evaluate the DocXClassifierFPN models on the RVL-CDIP dataset using the following script:

./scripts/run/evaluate/document_classification/evaluate_rvlcdip_fpn.sh

Evaluation on Tobacco3482 dataset:

Please download the Tobacco3482 dataset and place it under the directory $DATA_ROOT_DIR/documents/tobacco3482. Evaluate the DocXClassifier models on the Tobacco3482 dataset with ImageNet pretraining using the following script:

./scripts/run/evaluate/document_classification/evaluate_tobacco3482_no_fpn.sh

Evaluate the DocXClassifier models on the Tobacco3482 dataset with ImageNet pretraining using the following script:

./scripts/run/evaluate/document_classification/evaluate_tobacco3482_fpn.sh

Evaluate the DocXClassifier models on the Tobacco3482 dataset with RVL-CDIP pretraining using the following script:

./scripts/run/evaluate/document_classification/evaluate_tobacco3482_rvlcdip_pretrained_no_fpn.sh

Evaluate the DocXClassifier models on the Tobacco3482 dataset with RVL-CDIP pretraining using the following script:

./scripts/run/evaluate/document_classification/evaluate_tobacco3482_rvlcdip_pretrained_fpn.sh

Citation

If you find this useful in your research, please consider citing our associated paper:

@article{Saifullah2024,
  title = {DocXclassifier: towards a robust and interpretable deep neural network for document image classification},
  ISSN = {1433-2825},
  url = {http://dx.doi.org/10.1007/s10032-024-00483-w},
  DOI = {10.1007/s10032-024-00483-w},
  journal = {International Journal on Document Analysis and Recognition (IJDAR)},
  publisher = {Springer Science and Business Media LLC},
  author = {Saifullah,  Saifullah and Agne,  Stefan and Dengel,  Andreas and Ahmed,  Sheraz},
  year = {2024},
  month = jun
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.