/SceneTextPapers

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

SceneTextPapers

Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized

Information about this repository

This repo serves as a complement to our working paper:

  • Scene Text Detection and Recognition: The Deep Learning Era. Shangbang Long, Xin He, Cong Yao. Link to Arxiv v1

This repository will be updated in the following days, together with our survey paper.

Papers

I. Other Survey Papers:

  1. Text localization and recognition in images and video. Uchida, Seiichi. 2014 paper
  2. Text detection and recognition in imagery: A survey. Ye, Qixiang and Doermann, David. 2015 paper
  3. Text detection, tracking and recognition in video: A comprehensive survey. Yin, Xu-Cheng and Zuo, Ze-Yu and Tian, Shu and Liu, Cheng-Lin. 2016 paper
  4. Scene text detection and recognition: Recent advances and future trends. Zhu, Yingying and Yao, Cong and Bai, Xiang. 2016 paper

II. Main: Scene Text Detection and Recognition

2.1 Detection

2.1.1 Pipeline Simplification
Anchor-based methods
  1. Single Shot Text Detector With Regional Attention. He, Pan and Huang, Weilin and He, Tong and Zhu, Qile and Qiao, Yu and Li, Xiaolin. 2017 paper
  2. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. Liao, Minghui and Shi, Baoguang and Bai, Xiang and Wang, Xinggang and Liu, Wenyu. 2017 paper
  3. Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection. Liu, Yuliang and Jin, Lianwen. 2017 paper
  4. Detecting Oriented Text in Natural Images by Linking Segments. Shi, Baoguang and Bai, Xiang and Belongie, Serge. 2017 paper
  5. EAST: An Efficient and Accurate Scene Text Detector. Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun. 2017 paper
Region proposal methods
  1. Detecting Curve Text in the Wild: New Dataset and New Solution. Yuliang, Liu and Lianwen, Jin and Shuaitao, Zhang and Sheng, Zhang. 2017 paper
  2. R2CNN: rotational region CNN for orientation robust scene text detection. Jiang, Yingying and Zhu, Xiangyu and Wang, Xiaobing and Yang, Shuli and Li, Wei and Wang, Hua and Fu, Pei and Luo, Zhenbo. 2017 paper
  3. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. Ma, Jianqi and Shao, Weiyuan and Ye, Hao and Wang, Li and Wang, Hong and Zheng, Yingbin and Xue, Xiangyang. 2017 paper
  4. weakly supervised text attention network for generating text proposals in scene images. Rong, Li and MengYi, En and JianQiang, Li and HaiBin, Zhang. 2017 paper
  5. Rotation-Sensitive Regression for Oriented Scene Text Detection. Liao, Minghui and Zhu, Zhen and Shi, Baoguang and Xia, Gui-song and Bai, Xiang. 2018 paper
  6. Feature Enhancement Network: A Refined Scene Text Detector. Sheng, Zhang and Yuliang, Liu and Lianwen, Jin and Canjie, Luo. 2017 paper
2.1.2 Differnt Prediction Units
Text instance level
  1. Detecting Curve Text in the Wild: New Dataset and New Solution. Yuliang, Liu and Lianwen, Jin and Shuaitao, Zhang and Sheng, Zhang. 2017 paper
  2. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. Liao, Minghui and Shi, Baoguang and Bai, Xiang and Wang, Xinggang and Liu, Wenyu. 2017 paper
  3. EAST: An Efficient and Accurate Scene Text Detector. Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun. 2017 paper
  4. R2CNN: rotational region CNN for orientation robust scene text detection. Jiang, Yingying and Zhu, Xiangyu and Wang, Xiaobing and Yang, Shuli and Li, Wei and Wang, Hua and Fu, Pei and Luo, Zhenbo. 2017 paper
  5. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. Ma, Jianqi and Shao, Weiyuan and Ye, Hao and Wang, Li and Wang, Hong and Zheng, Yingbin and Xue, Xiangyang. 2017 paper
  6. Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection. Liu, Yuliang and Jin, Lianwen. 2017 paper
  7. Deep Direct Regression for Multi-Oriented Scene Text Detection. He, Wenhao and Zhang, Xu-Yao and Yin, Fei and Liu, Cheng-Lin. 2017 paper
  8. Fused Text Segmentation Networks for Multi-oriented Scene Text Detection. Dai, Yuchen and Huang, Zheng and Gao, Yuting and Chen, Kai. 2017 paper
  9. Feature Enhancement Network: A Refined Scene Text Detector. Sheng, Zhang and Yuliang, Liu and Lianwen, Jin and Canjie, Luo. 2017 paper
  10. Rotation-Sensitive Regression for Oriented Scene Text Detection. Liao, Minghui and Zhu, Zhen and Shi, Baoguang and Xia, Gui-song and Bai, Xiang. 2018 paper
Bottom-up (Pixel)
  1. Scene text detection via holistic, multi-channel prediction. Yao, Cong and Bai, Xiang and Sang, Nong and Zhou, Xinyu and Zhou, Shuchang and Cao, Zhimin. 2016 paper
  2. Multi-oriented text detection with fully convolutional networks. Zhang, Zheng and Zhang, Chengquan and Shen, Wei and Yao, Cong and Liu, Wenyu and Bai, Xiang. 2016 paper
  3. Self-organized Text Detection with Minimal Post-processing via Border Learning. Wu, Yue and Natarajan, Prem. 2017 paper
  4. Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting in the Wild. He, Dafang and Yang, Xiao and Liang, Chen and Zhou, Zihan and Ororbia, Alexander G and Kifer, Daniel and Giles, C Lee. 2017 paper
  5. Single Shot Text Detector With Regional Attention. He, Pan and Huang, Weilin and He, Tong and Zhu, Qile and Qiao, Yu and Li, Xiaolin. 2017 paper
  6. PixelLink: Detecting Scene Text via Instance Segmentation. Dan, Deng and Haifeng, Liu and Xuelong, Li and Deng, Cai. 2018 paper
Bottom-up (Components)
  1. Detecting text in natural image with connectionist text proposal network. Tian, Zhi and Huang, Weilin and He, Tong and He, Pan and Qiao, Yu. 2016 paper
  2. Aggregating local context for accurate scene text detection. He, Dafang and Yang, Xiao and Huang, Wenyi and Zhou, Zihan and Kifer, Daniel and Giles, C Lee. 2016 paper
  3. Detecting Oriented Text in Natural Images by Linking Segments. Shi, Baoguang and Bai, Xiang and Belongie, Serge. 2017 paper
  4. Scene Text Detection with Novel Superpixel Based Character Candidate Extraction. Wang, Cong and Yin, Fei and Liu, Cheng-Lin. 2017 paper
  5. Deep Residual Text Detection Network for Scene Text. Zhu, Xiangyu and Jiang, Yingying and Yang, Shuli and Wang, Xiaobing and Li, Wei and Fu, Pei and Wang, Hua and Luo, Zhenbo. 2017 paper
  6. Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation. Lyu, Pengyuan and Yao, Cong and Wu, Wenhao and Yan, Shuicheng and Bai, Xiang. 2018 paper
2.1.3 Specific Targets
Long text
  1. Detecting Oriented Text in Natural Images by Linking Segments. Shi, Baoguang and Bai, Xiang and Belongie, Serge. 2017 paper
  2. R2CNN: rotational region CNN for orientation robust scene text detection. Jiang, Yingying and Zhu, Xiangyu and Wang, Xiaobing and Yang, Shuli and Li, Wei and Wang, Hua and Fu, Pei and Luo, Zhenbo. 2017 paper
  3. Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation. Lyu, Pengyuan and Yao, Cong and Wu, Wenhao and Yan, Shuicheng and Bai, Xiang. 2018 paper
Multi-oriented text
  1. R2CNN: rotational region CNN for orientation robust scene text detection. Jiang, Yingying and Zhu, Xiangyu and Wang, Xiaobing and Yang, Shuli and Li, Wei and Wang, Hua and Fu, Pei and Luo, Zhenbo. 2017 paper
  2. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. Liao, Minghui and Shi, Baoguang and Bai, Xiang and Wang, Xinggang and Liu, Wenyu. 2017 paper
  3. Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection. Liu, Yuliang and Jin, Lianwen. 2017 paper
  4. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. Ma, Jianqi and Shao, Weiyuan and Ye, Hao and Wang, Li and Wang, Hong and Zheng, Yingbin and Xue, Xiangyang. 2017 paper
  5. Detecting Oriented Text in Natural Images by Linking Segments. Shi, Baoguang and Bai, Xiang and Belongie, Serge. 2017 paper
  6. EAST: An Efficient and Accurate Scene Text Detector. Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun. 2017 paper
  7. Rotation-Sensitive Regression for Oriented Scene Text Detection. Liao, Minghui and Zhu, Zhen and Shi, Baoguang and Xia, Gui-song and Bai, Xiang. 2018 paper
  8. Geometry-Aware Scene Text Detection With Instance Transformation Network. Wang, Fangfang and Zhao, Liming and Li, Xi and Wang, Xinchao and Tao, Dacheng. 2018 paper
Irregular text
  1. Detecting Curve Text in the Wild: New Dataset and New Solution. Yuliang, Liu and Lianwen, Jin and Shuaitao, Zhang and Sheng, Zhang. 2017 paper
  2. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. Lyu, Pengyuan and Liao, Minghui and Yao, Cong and Wu, Wenhao and Bai, Xiang. 2018 paper
  3. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. Long, Shangbang and Ruan, Jiaqiang and Zhang, Wenjie and He, Xin and Wu, Wenhao and Yao, Cong. 2018 paper
Speed up
  1. EAST: An Efficient and Accurate Scene Text Detector. Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun. 2017 paper
Easy instance segmentation
  1. Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting in the Wild. He, Dafang and Yang, Xiao and Liang, Chen and Zhou, Zihan and Ororbia, Alexander G and Kifer, Daniel and Giles, C Lee. 2017 paper
  2. Self-organized Text Detection with Minimal Post-processing via Border Learning. Wu, Yue and Natarajan, Prem. 2017 paper
  3. WordFence: Text Detection in Natural Images with Border Awareness. Polzounov, Andrei and Ablavatski, Artsiom and Escalera, Sergio and Lu, Shijian and Cai, Jianfei. 2017 paper
  4. PixelLink: Detecting Scene Text via Instance Segmentation. Dan, Deng and Haifeng, Liu and Xuelong, Li and Deng, Cai. 2018 paper
Retrieving designated text
  1. Unambiguous text localization and retrieval for cluttered scenes. Rong, Xuejian and Yi, Chucai and Tian, Yingli. 2017 paper
Against complex background
  1. Single Shot Text Detector With Regional Attention. He, Pan and Huang, Weilin and He, Tong and Zhu, Qile and Qiao, Yu and Li, Xiaolin. 2017 paper

2.2 Recognition

2.2.1 CTC based methods
  1. Unconstrained on-line handwriting recognition with recurrent neural networks. Graves, Alex and Liwicki, Marcus and Bunke, Horst and Schmidhuber, Jurgen and Fernandez, Santiago. 2008 paper
  2. Accurate scene text recognition based on recurrent neural network. Su, Bolan and Lu, Shijian. 2014 paper
  3. STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition. Liu, Wei and Chen, Chaofeng and Wong, Kwan-Yee K and Su, Zhizhong and Han, Junyu. 2016 paper
  4. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. Shi, Baoguang and Bai, Xiang and Yao, Cong. 2017 paper
  5. Reading Scene Text with Attention Convolutional Sequence Modeling. Gao, Yunze and Chen, Yingying and Wang, Jinqiao and Lu, Hanqing. 2017 paper,
  6. Scene Text Recognition with Sliding Convolutional Character Models. Yin, Fei and Wu, Yi-Chao and Zhang, Xu-Yao and Liu, Cheng-Lin. 2017 paper
2.2.2 Attention based methods
  1. Robust scene text recognition with automatic rectification. Shi, Baoguang and Wang, Xinggang and Lyu, Pengyuan and Yao, Cong and Bai, Xiang. 2016 paper
  2. Recursive recurrent nets with attention modeling for ocr in the wild. Lee, Chen-Yu and Osindero, Simon. 2016 paper
  3. Visual attention models for scene text recognition. Ghosh, Suman K and Valveny, Ernest and Bagdanov, Andrew D. 2017 paper
  4. Focusing Attention: Towards Accurate Text Recognition in Natural Images. Cheng, Zhanzhan and Bai, Fan and Xu, Yunlu and Zheng, Gang and Pu, Shiliang and Zhou, Shuigeng. 2017 paper
  5. Learning to Read Irregular Text with Attention Mechanisms. Yang, Xiao and He, Dafang and Zhou, Zihan and Kifer, Daniel and Giles, C Lee. 2017 paper
  6. Arbitrarily-Oriented Text Recognition. Cheng, Zhanzhan and Liu, Xuyang and Bai, Fan and Niu, Yi and Pu, Shiliang and Zhou, Shuigeng. 2017 paper
  7. SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoder-decoder Network. Liu, Zichuan and Li, Yixing and Ren, Fengbo and Yu, Hao and Goh, Wangling. 2018 paper

2.3 End-to-End Text Spotting

2.3.1 Separately Trained Two-Stage Methods
  1. Reading text in the wild with convolutional neural networks. Jaderberg, Max and Simonyan, Karen and Vedaldi, Andrea and Zisserman, Andrew. 2016 paper
  2. Synthetic data for text localisation in natural images. Gupta, Ankush and Vedaldi, Andrea and Zisserman, Andrew. 2016 paper
  3. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. Liao, Minghui and Shi, Baoguang and Bai, Xiang and Wang, Xinggang and Liu, Wenyu. 2017 paper
2.3.2 Jointly Trained Two-Stage Methods
  1. SEE: Towards Semi-Supervised End-to-End Scene Text Recognition. Bartz, Christian and Yang, Haojin and Meinel, Christoph. 2017 paper
  2. Deep TextSpotter: An End-To-End Trainable Scene Text Localization and Recognition Framework. Busta, Michal and Neumann, Lukas and Matas, Jiri. 2017 paper
  3. Towards End-To-End Text Spotting With Convolutional Recurrent Neural Networks. Li, Hui and Wang, Peng and Shen, Chunhua. 2017 paper
  4. An End-to-End TextSpotter With Explicit Alignment and Attention. He, Tong and Tian, Zhi and Huang, Weilin and Shen, Chunhua and Qiao, Yu and Sun, Changming. 2018 paper
  5. FOTS: Fast Oriented Text Spotting with a Unified Network. Liu, Xuebo and Liang, Ding and Yan, Shi and Chen, Dagui and Qiao, Yu and Yan, Junjie. 2018 paper
  6. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. Lyu, Pengyuan and Liao, Minghui and Yao, Cong and Wu, Wenhao and Bai, Xiang. 2018 paper

2.4 Auxilliary Techs

2.4.1 Synthetic Data
  1. Synthetic data and artificial neural networks for natural scene text recognition. Jaderberg, Max and Simonyan, Karen and Vedaldi, Andrea and Zisserman, Andrew. 2014 paper
  2. Synthetic data for text localisation in natural images. Gupta, Ankush and Vedaldi, Andrea and Zisserman, Andrew. 2016 paper
  3. Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes. Zhan, Fangneng and Lu, Shijian and Xue, Chuhui. 2018 paper
2.4.2 Bootstrapping
  1. Wetext: Scene text detection under weak supervision. Tian, Shangxuan and Lu, Shijian and Li, Chongshou. 2017 paper
  2. weakly supervised text attention network for generating text proposals in scene images. Rong, Li and MengYi, En and JianQiang, Li and HaiBin, Zhang. 2017 paper
  3. Wordsup: Exploiting word annotations for character based text detection. Hu, Han and Zhang, Chengquan and Luo, Yuxuan and Wang, Yuzhuo and Han, Junyu and Ding, Errui. 2018 paper
2.4.3 Deblurring
  1. Convolutional neural networks for direct text deblurring. Hradis, Michal and Kotera, Jan and Zemcik, Pavel and Sroubek, Filip. 2015 paper
  2. A blind deconvolution model for scene text detection and recognition in video. Khare, Vijeta and Shivakumara, Palaiahnakote and Raveendran, Paramesran and Blumenstein, Michael. 2016 paper
2.4.4 Context Information
  1. Could scene context be beneficial for scene text detection? Zhu, Anna and Gao, Renwu and Uchida, Seiichi. 2016 paper
2.4.5 Adversarial Attack
  1. Adaptive Adversarial Attack on Scene Text Recognition. Yuan, Xiaoyong and He, Pan and Li, Xiaolin Andy. 2018 paper

III. Datasets

Dataset (Year) Image Num (train/test) Text Num (train/test) Orientation Language Characteristics Detec/Recog Task
End2End ==== ==== ==== ==== ==== ====
ICDAR03 (2003) 509 (258/251) 2276 (1110/1156) Horizontal En - ✓/✓
ICDAR13 Scene Text(2013) 462 (229/233) - (848/1095) Horizontal En - ✓/✓
ICDAR15 Incidental Text(2015) 1500 (1000/500) - (-/-) Multi-Oriented En Blur, Small, Defocused ✓/✓
ICDAR17 / RCTW (2017) 12263 (8034/4229) - (-/-) Multi-Oriented Chinese - ✓/✓
Total-Text (2017) 1555 (1255/300) - (-/-) Multi-Oriented, Curved En, Ch Irregular polygon label ✓/✓
SVT (2010) 350 (100/250) 904 (257/647) Horizontal En - ✓/✓
KAIST (2010) 3000 (-/-) 5000 (-/-) Horizontal En, Ko Distorted ✓/✓
NEOCR (2011) 659 (-/-) 5238 (-/-) Multi-oriented 8 langs - ✓/✓
CUTE (2014) 80 (-/80) - (-/-) Curved En - ✓/✓
CTW (2017) 32K ( 25K/6K) 1M ( 812K/205K) Multi-Oriented Chinese Fine-grained annotation ✓/✓
Detection Only ==== ==== ==== ==== ==== ====
OSTD (2011) 89 (-/-) 218 (-/-) Multi-oriented En - ✓/-
MSRA-TD500 (2012) 500 (300/200) 1719 (1068/651) Multi-Oriented En, Ch Long text ✓/-
HUST-TR400 (2014) 400 (400/-) - (-/-) Multi-Oriented En, Ch Long text ✓/-
ICDAR17 / RRC-MLT (2017) 18000 (9000/9000) - (-/-) Multi-Oriented 9 langs - ✓/-
CTW1500 (2017) 1500 (1000/500) - (-/-) Multi-Oriented, Curved En Bounding box with 14 vertexes ✓/-
Recognition Only ==== ==== ==== ==== ==== ====
Char74k (2009) 74107 (-/-) 74107 (-/-) Horizontal En, Kannada Character label -/✓
IIIT 5K-Word (2012) 5000 (-/-) 5000 (2000/3000) Horizontal - cropped -/✓
SVHN (2010) - (-/-) 600000 (-/-) Horizontal - House number digits -/✓
SVTP (2013) 639 (-/639) - (-/-) En Distorted -/✓