/GraphKD

[ICDAR 2024] (Best Student Paper🏆) Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

Primary LanguagePythonMIT LicenseMIT

GraphKD (Best Student Paper ICDAR, 2024,🏆)

Description

Pytorch implementation of the paper Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation. This model is implemented on top of the detectron2 framework. The proposed architecture explores graph-based knowledge distillation to mitigate the trade-off between no. of model parameters (trainable) and performance accuracy towards document knowledge distillation with adaptive node sampling strategy and weighted edge distillation via Mahalanobis distance.

Structured graph creation: We extracted the RoI pooled features and classified them into "Text" and "Non-text" based on their covariance. Then we initialize the node in the identified RoI regions and define the adjacency edges. Lastly, we iteratively merge the text node with an adaptive sample mining strategy to reduce text bias.

Getting Started

Step 1: Clone this repository and change the directory to the repository root

git clone https://github.com/ayanban011/GraphKD.git 
cd GraphKD

Step 2: Setup and activate the conda environment with required dependencies:

conda create --name graphkd python=3.9
conda activate graphkd
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git' --user

Step 3: Running

for training:

./start_train.sh train projects/Distillation/configs/Distillation-FasterRCNN-R18-R50-dsig-1x.yaml

for testing:

./start_train.sh eval projects/Distillation/configs/Distillation-FasterRCNN-R18-R50-dsig-1x.yaml

for debugging:

./start_train.sh debugtrain projects/Distillation/configs/Distillation-FasterRCNN-R18-R50-dsig-1x.yaml

Results and Model Zoo

1. PublayNet

Model Config-file Weights AP
R50-R18 config-publay model 28.0
R101-R50 config-publay model 88.6
R152-R101 config-publay model 88.8
R101-EB0 config-publay model 27.6
R50-MNv2 config-publay model 28.2

2. Prima

Model Config-file Weights AP
R50-R18 config-prima model 26.5
R101-R50 config-prima model 35.0
R152-R101 config-prima model 41.9
R101-EB0 config-prima model 12.6
R50-MNv2 config-prima model 14.9

3. Historical Japanese

Model Config-file Weights AP
R50-R18 config-prima model 33.4
R101-R50 config-prima model 78.3
R152-R101 config-prima model 79.7
R101-EB0 config-prima model 33.1
R50-MNv2 config-prima model 37.5

4. DoclayNet

Model Config-file Weights AP
R50-R18 config-prima model 42.1
R101-R50 config-prima model 65.0
R152-R101 config-prima model 68.9
R101-EB0 config-prima model 28.9
R50-MNv2 config-prima model 23.6

Citation

If you find this useful for your research, please cite it as follows:

@article{banerjee2024graphkd,
  title={GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation},
  author={Banerjee, Ayan and Biswas, Sanket and Llad{\'o}s, Josep and Pal, Umapada},
  journal={arXiv preprint arXiv:2402.11401},
  year={2024}
}

Acknowledgement

We have built it on the top of the Dsig.

Conclusion

Thank you for your interest in our work, and sorry if there are any bugs.