/SEED

Primary LanguagePython

SE_ASTER

Introduction

This is the implementation of the paper "SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition" This code is based on the aster.pytorch, we sincerely thank ayumiymk for his awesome repo and help.

How to use

Env

PyTorch == 1.1.0
torchvision == 0.3.0
fasttext == 0.9.1

Details can be found in requirements.txt

Train

Prepare your data
  • Download the pretrained language model (bin) from here
  • Update the path in the lib/tools/create_all_synth_lmdb.py
  • Run the lib/tools/create_all_synth_lmdb.py
  • Note: it may result in large storage space, you can modify the datasets/dataset.py to generate the word embedding in an online way
Run
  • Update the path in train.sh, then
sh train.sh

Test

  • Update the path in the test.sh, then
sh test.sh

Experiments

Evaluation on benchmarks

  • You can downlod the benchmark datasets from BaiduYun (key: nphk) shared by clovaai in this repo.
Checkpoint IIIT5K IC13-1015 IC13-857 IC15-1811 IC15-2077 SVT SVTP CUTE
OneDrive BaiduYun(key: x54e) 93.4 93.5 94.5 79.8 75.8 88.4 82.0 84.0

Evalution with lexicons

  • Existing methods replace the predicted word with the nearest lexicon word under the metric of edit distance (ED). With the semantic information, we can choose the most semantics similar (SS) word based on the nearest edit distance.
Methods IIIT5K-50 IIIT5K-1K SVT-50 IC13 IC15
ED 99.06 97.87 96.36 97.44 87.76
ED + SS 99.27 97.93 96.45 97.64 88.07

About the word embedding

  • Directly use word embedding from the pre-trained LM during training and inference.
IIIT5K IC13 IC15-1811 IC15-2077 SVT SVTP CUTE
94.6 93.8 85.0 79.6 90.9 84.2 85.4

Exploration on global information

  • We try to use Aggregation Cross-Entropy as the global information instead of the semantics. This part of code will be released in next few days.
IIIT5K IC13 IC15-1811 IC15-2077 SVT SVTP CUTE
93.8 91.3 78.7 - 90.1 81.6 81.9

Citation

@inproceedings{qiao2020seed,
  title={{SEED}: Semantics enhanced encoder-decoder framework for scene text recognition},
  author={Qiao, Zhi and Zhou, Yu and Yang, Dongbao and Zhou, Yucan and Wang, Weiping},
  booktitle={CVPR},
  year={2020},
}
@article{shi2018aster,
  title={{ASTER}: An attentional scene text recognizer with flexible rectification},
  author={Shi, Baoguang and Yang, Mingkun and Wang, Xinggang and Lyu, Pengyuan and Yao, Cong and Bai, Xiang},
  journal={TPAMI},
  volume={41},
  number={9},
  pages={2035--2048},
  year={2018},
  publisher={IEEE}
}