/T2T-BinFormer

SOTA Document Image Enhancement - T2T-BinFormer: Effective Document Image Enhancement Using tokens-to-token Transformer Network

Primary LanguagePython

🥇SOTA Document Image Enhancement - A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical Document Image Enhancement

The official PyTorch code for the project A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical Document Image Enhancement.

Description

We propose to employ a Tokens-to-Token Transformer network for document image enhancement, a novel encoder-decoder architecture based on a tokens-to-token vision transformer.

alt text

alt text

Step 1 - Download Code

Clone the repository to your desired location:

git clone https://github.com/RisabBiswas/T2T-BinFormer
cd T2T-BinFormer

Step 2 - Process Data

Data Path

The research and experiments are conducted on the DIBCO and H-DIBCO datasets. Find the dataset here - Link. After downloading, extract the folder named DIBCOSETS and place it in your desired data path. Means: /YOUR_DATA_PATH/DIBCOSETS/

Additional Data Path

  • PALM Dataset - Link
  • Persian Heritage Image Binarization Dataset - Link
  • Degraded Maps - Link

Data Splitting

Specify the data path, split size, validation, and testing sets to prepare your data. In this example, we set the split size as (256 X 256), the validation set as 2016, and the testing set as 2018 while running the process_dibco.py file.

python process_dibco.py --data_path /YOUR_DATA_PATH/ --split_size 256 --testing_dataset 2018 --validation_dataset 2016

Using T2T-BinFormer

Step 3 - Training

For training, specify the desired settings (batch_size, patch_size, model_size, split_size, and training epochs) when running the file train.py. For example, for a base model with a patch size of (16 X 16) and a batch size of 32, we use the following command:

python train.py --data_path /YOUR_DATA_PATH/ --batch_size 32 --vit_model_size base --vit_patch_size 16 --epochs 151 --split_size 256 --validation_dataset 2016

You will get visualization results from the validation dataset on each epoch in a folder named vis+"YOUR_EXPERIMENT_SETTINGS" (it will be created). In the previous case, it will be named visbase_256_16. Also, the best weights will be saved in the folder named "weights".

Step 4 - Testing on a DIBCO dataset

To test the trained model on a specific DIBCO dataset (should match the one specified in Section Process Data, if not, run process_dibco.py again). Use your own trained model weights. Then, run the below command. Here, I test on H-DIBCO 2017, using the base model with a 16X16 patch size and a batch size of 16. The binarized images will be in the folder ./vis+"YOUR_CONFIGS_HERE"/epoch_testing/

python test.py --data_path /YOUR_DATA_PATH/ --model_weights_path  /THE_MODEL_WEIGHTS_PATH/  --batch_size 16 --vit_model_size base --vit_patch_size 16 --split_size 256 --testing_dataset 2017

Resuts

The results of our model can be found Here.

Acknowledgement

Our project has adapted and borrowed the code structure from DocEnTr. We are thankful to the authors! Additionally, we really appreciate the great work done on vit_pytorch by Phil Wang.

Authors

Citation

If you use the T2T-BinFormer code in your research, we would appreciate a citation to the original paper:

  @misc{biswas2023layerwise,
        title={A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical Document Image Enhancement}, 
        author={Risab Biswas and Swalpa Kumar Roy and Umapada Pal},
        year={2023},
        eprint={2312.03946},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }

Contact

If you have any questions, please feel free to reach out to Risab Biswas.

Conclusion

We really appreciate your interest in our research. The code should not have any bugs, but if there are any, we are really sorry about that. Do let us know in the issues section, and we will fix it ASAP! Cheers!