An implementation of CVPR 2020 paper "SCATTER: Selective Context Attentional Scene Text Recognizer"

Introduction

This is an unofficial implementation of paper "SCATTER: Selective Context Attentional Scene Text Recognizer" published at CVPR 2020.

Getting Started

Dependency

This work was tested with PyTorch 1.6.0, CUDA 10.2, python 3.6.10 and Ubuntu 18.04.

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

requirements : lmdb, pillow, nltk, natsort

pip3 install lmdb pillow nltk natsort

Dataset

training dataset: MJSynth (MJ)[1], SynthText (ST)[2] and SynthAdd (SA) [3]
validation datasets : the union of the training sets IC13[4], IC15[5], IIIT[6], and SVT[7].
evaluation datasets : benchmark evaluation datasets, consist of IIIT[5], SVT[7], IC03[8], IC13[4], IC15[5], SVTP[9], and CUTE[10].

Pretrained Model

Two pretrained models are provided (Will be updated when better models are trained):

non-senstive: includes ten digits (0-9) and 26 characters (a-z).
sensitive: includes all readable characters.

Pretrained models can be downloaded here

Run demo

With non-sensitve model

python demo.py --saved_model scatter-case-non-sensitive.pth --sensitive --image_folder <path_to_image_folder>

With sensitve model

python demo.py --saved_model scatter-case-sensitive.pth --image_folder <path_to_image_folder>

Training and evaluation

Download lmdb dataset for traininig and evaluation provided by deep-text-recognition-benchmark from here

Download addition dataset SynthText_Add (SA) for training from here (includes raw images and lmdb format).

Training

python3 train.py --train_data data_lmdb_release/training --valid_data data_lmdb_release/validation --select_data MJ-ST-SA --batch_ratio 0.4-0.4-0.2 --sensitive

Testing

python3 test.py --eval_data data_lmdb_release/evaluation --saved_model scatter-case-sensitive.pth --sensitive --data_filtering_off

Reported results

Using evaluation set here
Compare with result in the original paper and baseline model.

Model	IIIT5K	SVT	IC03	IC13	Regular Text	IC15	SVTP	CUTE	Irregular Text
Paper (non-sensitive)	93.7	92.7	96.3	93.9	94.0	82.2	86.9	87.5	83.7
Baseline	87.9	87.5	94.9	92.3	89.8	71.8	79.2	74.0	73.6
Our (sensitive)	93.5	90.9	95.0	93.6	93.4	78.6	83.4	83.3	80.0
Our (non-sensitive)	93.8	90.9	95.3	93.8	93.7	79.7	85.0	86.1	81.5

Acknowledgements

This code is built upon deep-text-recognition-benchmark.

Reference

[1] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Synthetic data and artificial neural networks for natural scenetext recognition. In Workshop on Deep Learning, NIPS, 2014.
[2] A. Gupta, A. Vedaldi, and A. Zisserman. Synthetic data fortext localisation in natural images. In CVPR, 2016.
[3] Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang. Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition. In AAAI, 2019
[4] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Big-orda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, andL. P. De Las Heras. ICDAR 2013 robust reading competition. In ICDAR, pages 1484–1493, 2013.
[5] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R.Chandrasekhar, S. Lu, et al. ICDAR 2015 competition on ro-bust reading. In ICDAR, pages 1156–1160, 2015.
[6] A. Mishra, K. Alahari, and C. Jawahar. Scene text recognition using higher order language priors. In BMVC, 2012.
[7] K. Wang, B. Babenko, and S. Belongie. End-to-end scenetext recognition. In ICCV, pages 1457–1464, 2011.
[8] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, andR. Young. ICDAR 2003 robust reading competitions. In ICDAR, pages 682–687, 2003.
[9] T. Q. Phan, P. Shivakumara, S. Tian, and C. L. Tan. Recognizing text with perspective distortion in natural scenes. In ICCV, pages 569–576, 2013.
[10] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan. A robust arbitrary text detection system for natural scene images. In ESWA, volume 41, pages 8027–8048, 2014.
[11] B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In TPAMI, volume 39, pages2298–2304. 2017.

Citation

Please consider citing this work in your publications if it helps your research.

@inproceedings{litman2020scatter,
  title={SCATTER: selective context attentional scene text recognizer},
  author={Litman, Ron and Anschel, Oron and Tsiper, Shahar and Litman, Roee and Mazor, Shai and Manmatha, R},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={11962--11972},
  year={2020}
}

yusirhhh/cvpr20-scatter-text-recognizer