DocXclassifier: towards a robust and interpretable deep neural network for document image classification
This repository contains the evaluation code for the paper DocXclassifier: towards a robust and interpretable deep neural network for document image classification by Saifullah Saifullah, Stefan Agne, Andreas Dengel, and Sheraz Ahmed.
Requires Python 3+. For evaluation, please follow the steps below.
git clone https://github.com/saifullah3396/docxclassifier.git --recursive
Install the dependencies:
pip install -r requirements.txt
export PYTHONPATH=./external/torchfusion/src
export DATA_ROOT_DIR=/home/ataraxia/Datasets/
export TORCH_FUSION_CACHE_DIR=</your/cache/dir>
export TORCH_FUSION_OUTPUT_DIR=</your/output/dir> # can be any directory where datasets are cached and model training outputs are generated.
Model | Dataset | Accuracy |
---|---|---|
DocXClassifier-B | RVL-CDIP | 94.00% |
DocXClassifier-L | RVL-CDIP | 94.15% |
DocXClassifier-XL | RVL-CDIP | 94.17% |
DocXClassifier-B | Tobacco3482 (RVL-CDIP Pretraining) | 95.29% |
DocXClassifier-L | Tobacco3482 (RVL-CDIP Pretraining) | 95.57% |
DocXClassifier-XL | Tobacco3482 (RVL-CDIP Pretraining) | 95.43% |
DocXClassifier-B | Tobacco3482 (ImageNet Pretraining) | 87.43% |
DocXClassifier-L | Tobacco3482 (ImageNet Pretraining) | 88.43% |
DocXClassifier-XL | Tobacco3482 (ImageNet Pretraining) | 90.14% |
Model | Dataset | Accuracy |
---|---|---|
DocXClassifierFPN-B | RVL-CDIP | 94.04% |
DocXClassifierFPN-L | RVL-CDIP | 94.13% |
DocXClassifierFPN-XL | RVL-CDIP | 94.19% |
DocXClassifierFPN-B | Tobacco3482 (RVL-CDIP Pretraining) | 95.57% |
DocXClassifierFPN-L | Tobacco3482 (RVL-CDIP Pretraining) | 95.71% |
DocXClassifierFPN-XL | Tobacco3482 (RVL-CDIP Pretraining) | 94.86% |
DocXClassifierFPN-B | Tobacco3482 (ImageNet Pretraining) | 88.43% |
DocXClassifierFPN-L | Tobacco3482 (ImageNet Pretraining) | 89.57% |
DocXClassifierFPN-XL | Tobacco3482 (ImageNet Pretraining) | 90.29% |
Please download the RVL-CDIP dataset and place it under the directory $DATA_ROOT_DIR/documents/rvlcdip. Evaluate the DocXClassifier models on the RVL-CDIP dataset using the following script:
./scripts/run/evaluate/document_classification/evaluate_rvlcdip_no_fpn.sh
Evaluate the DocXClassifierFPN models on the RVL-CDIP dataset using the following script:
./scripts/run/evaluate/document_classification/evaluate_rvlcdip_fpn.sh
Please download the Tobacco3482 dataset and place it under the directory $DATA_ROOT_DIR/documents/tobacco3482. Evaluate the DocXClassifier models on the Tobacco3482 dataset with ImageNet pretraining using the following script:
./scripts/run/evaluate/document_classification/evaluate_tobacco3482_no_fpn.sh
Evaluate the DocXClassifier models on the Tobacco3482 dataset with ImageNet pretraining using the following script:
./scripts/run/evaluate/document_classification/evaluate_tobacco3482_fpn.sh
Evaluate the DocXClassifier models on the Tobacco3482 dataset with RVL-CDIP pretraining using the following script:
./scripts/run/evaluate/document_classification/evaluate_tobacco3482_rvlcdip_pretrained_no_fpn.sh
Evaluate the DocXClassifier models on the Tobacco3482 dataset with RVL-CDIP pretraining using the following script:
./scripts/run/evaluate/document_classification/evaluate_tobacco3482_rvlcdip_pretrained_fpn.sh
If you find this useful in your research, please consider citing our associated paper:
@article{Saifullah2024,
title = {DocXclassifier: towards a robust and interpretable deep neural network for document image classification},
ISSN = {1433-2825},
url = {http://dx.doi.org/10.1007/s10032-024-00483-w},
DOI = {10.1007/s10032-024-00483-w},
journal = {International Journal on Document Analysis and Recognition (IJDAR)},
publisher = {Springer Science and Business Media LLC},
author = {Saifullah, Saifullah and Agne, Stefan and Dengel, Andreas and Ahmed, Sheraz},
year = {2024},
month = jun
}
This repository is released under the Apache 2.0 license as found in the LICENSE file.