This is the official code of "Self-Supervised Implicit Glyph Attention for Text Recognition". For more details, please refer to our CVPR2023 paper or Poster or 中文解读. If you have any questions please contact me by email (gtk0615@sjtu.edu.cn).
We also released ICCV23 work on scene text recognition:
# V100 Ubuntu 16.04 Cuda 10
conda create -n SIGA python==3.7.0
source activate SIGA
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install tensorboard==1.15.0
pip install tensorboardX==2.2
pip install opencv-python
pip install Pillow LMDB nltk six natsort scipy
# 3090 Ubuntu 16.04 Cuda 11
conda create -n SIGA python==3.7.0
source activate SIGA
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install tensorboard==2.11.2
pip install tensorboardX==2.2
pip install opencv-python
pip install Pillow LMDB nltk six natsort scipy
# if you meet bug about setuptools
# pip uninstall setuptools
# pip install setuptools==58.0.4
-- root_path
--training
--MJ
--ST
--validation
--evaluation
--SVT
--IIIK
--...
- Dataset link:
- weight link:
- optional, K-means results (please refer to CCD)
cd ./mask_create
run generate_mask.py #parallelly process mask --> lmdb file
run merge.py #merge multiple lmdb files into single file
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train.py --model_name TRBA --exp_name SIGA --Aug --batch_size 320 --num_iter 160000 --select_data synth --benchmark_all_eval --train_data /xxx/dataset/data_lmdb/training/label/Synth/ --eval_data /xxx/dataset/data_lmdb/evaluation/ --mask_path /xxx/dataset/data_lmdb/Mask(optional) --workers 12
python test.py --eval_data /xxx/xxx --select_data xxx
- Refactor and clean code
If you find our method useful for your reserach, please cite
@inproceedings{guan2023self,
title={Self-Supervised Implicit Glyph Attention for Text Recognition},
author={Guan, Tongkun and Gu, Chaochen and Tu, Jingzheng and Yang, Xue and Feng, Qi and Zhao, Yudi and Shen, Wei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={15285--15294},
year={2023}
}
- This code are only free for academic research purposes and licensed under the 2-clause BSD License - see the LICENSE file for details.