/cvpr20-scatter-text-recognizer

Unofficial implementation of CVPR 2020 paper "SCATTER: Selective Context Attentional Scene Text Recognizer"

Primary LanguagePythonApache License 2.0Apache-2.0

An implementation of CVPR 2020 paper "SCATTER: Selective Context Attentional Scene Text Recognizer"

Paper | Pretrained model

Introduction

This is an unofficial implementation of paper "SCATTER: Selective Context Attentional Scene Text Recognizer" published at CVPR 2020.

Getting Started

Dependency

  • This work was tested with PyTorch 1.6.0, CUDA 10.2, python 3.6.10 and Ubuntu 18.04.
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
  • requirements : lmdb, pillow, nltk, natsort
pip3 install lmdb pillow nltk natsort

Dataset

Pretrained Model

Two pretrained models are provided (Will be updated when better models are trained):

  1. non-senstive: includes ten digits (0-9) and 26 characters (a-z).
  2. sensitive: includes all readable characters.

Pretrained models can be downloaded here

Run demo

  • With non-sensitve model
python demo.py --saved_model scatter-case-non-sensitive.pth --sensitive --image_folder <path_to_image_folder>
  • With sensitve model
python demo.py --saved_model scatter-case-sensitive.pth --image_folder <path_to_image_folder>

Training and evaluation

Download lmdb dataset for traininig and evaluation provided by deep-text-recognition-benchmark from here

Download addition dataset SynthText_Add (SA) for training from here (includes raw images and lmdb format).

Training

python3 train.py --train_data data_lmdb_release/training --valid_data data_lmdb_release/validation --select_data MJ-ST-SA --batch_ratio 0.4-0.4-0.2 --sensitive 

Testing

python3 test.py --eval_data data_lmdb_release/evaluation --saved_model scatter-case-sensitive.pth --sensitive --data_filtering_off

Reported results

  • Using evaluation set here

  • Compare with result in the original paper and baseline model.

Model IIIT5K SVT IC03 IC13 Regular Text IC15 SVTP CUTE Irregular Text
Paper (non-sensitive) 93.7 92.7 96.3 93.9 94.0 82.2 86.9 87.5 83.7
Baseline 87.9 87.5 94.9 92.3 89.8 71.8 79.2 74.0 73.6
Our (sensitive) 93.5 90.9 95.0 93.6 93.4 78.6 83.4 83.3 80.0
Our (non-sensitive) 93.8 90.9 95.3 93.8 93.7 79.7 85.0 86.1 81.5

Acknowledgements

This code is built upon deep-text-recognition-benchmark.

Reference

[1] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Synthetic data and artificial neural networks for natural scenetext recognition. In Workshop on Deep Learning, NIPS, 2014.
[2] A. Gupta, A. Vedaldi, and A. Zisserman. Synthetic data fortext localisation in natural images. In CVPR, 2016.
[3] Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang. Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition. In AAAI, 2019
[4] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Big-orda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, andL. P. De Las Heras. ICDAR 2013 robust reading competition. In ICDAR, pages 1484–1493, 2013.
[5] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R.Chandrasekhar, S. Lu, et al. ICDAR 2015 competition on ro-bust reading. In ICDAR, pages 1156–1160, 2015.
[6] A. Mishra, K. Alahari, and C. Jawahar. Scene text recognition using higher order language priors. In BMVC, 2012.
[7] K. Wang, B. Babenko, and S. Belongie. End-to-end scenetext recognition. In ICCV, pages 1457–1464, 2011.
[8] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, andR. Young. ICDAR 2003 robust reading competitions. In ICDAR, pages 682–687, 2003.
[9] T. Q. Phan, P. Shivakumara, S. Tian, and C. L. Tan. Recognizing text with perspective distortion in natural scenes. In ICCV, pages 569–576, 2013.
[10] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan. A robust arbitrary text detection system for natural scene images. In ESWA, volume 41, pages 8027–8048, 2014.
[11] B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In TPAMI, volume 39, pages2298–2304. 2017.

Citation

Please consider citing this work in your publications if it helps your research.

@inproceedings{litman2020scatter,
  title={SCATTER: selective context attentional scene text recognizer},
  author={Litman, Ron and Anschel, Oron and Tsiper, Shahar and Litman, Roee and Mazor, Shai and Manmatha, R},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={11962--11972},
  year={2020}
}