Scene Text

A curated list of papers and resources for scene text detection and recognition

The year when a paper was first published, including ArXiv publications, is used. As a result, there may be cases when a paper was accepted for example to CVPR 2019, but it is listed in year 2018 because it was published in 2018 on ArXiv.

Table of contents
1. Scene Text Detection
2. Weakly Supervised Scene Text Detection
3. Scene Text Recognition
4. Other scene text papers
5. Scene Text Survey papers
6. Dataset

Scene Text Detection (including methods for end-to-end detection and recognition)

2010

Detecting text in natural scenes with stroke width transform [CVPR 2010] [paper]
A Method for Text Localization and Recognition in Real-World Images [ACCV 2010] [paper]

2011

2012

Real-time scene text localization and recognition [CVPR 2012] [paper]

2013

2014

Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees [ECCV 2014] [paper]

2015

Symmetry-based text line detection in natural scenes [CVPR 2015] [paper]
Object proposals for text extraction in the wild [ICDAR 2015] [paper]
Text-Attentional Convolutional Neural Network for Scene Text Detection [TIP 2016] [paper]
Text Flow : A Unified Text Detection System in Natural Scene Images [ICCV 2015] [paper]

2016

Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network [ArXiv] [paper]
Multi-Oriented Text Detection With Fully Convolutional Networks [CVPR 2016] [paper]
Scene Text Detection Via Holistic, Multi-Channel Prediction [ArXiv] [paper]
Detecting Text in Natural Image with Connectionist Text Proposal Network [ECCV 2016] [paper]
TextBoxes: A Fast Text Detector with a Single Deep Neural Network [AAAI 2017] [paper]
- https://github.com/MhLiao/TextBoxes [Caffe]
- https://github.com/shinjayne/shinTB [TF]

2017

Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting In The Wild [CVPR 2017] [paper]
Deep TextSpotter: An End-To-End Trainable Scene Text Localization and Recognition Framework [ICCV 2017] [paper]
Arbitrary-Oriented Scene Text Detection via Rotation Proposals [TMM 2018] [paper]
- https://github.com/mjq11302010044/RRPN [Caffe]
Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection [CVPR 2017] [paper]
Detecting Oriented Text in Natural Images by Linking Segments [CVPR 2017] [paper]
- https://github.com/bgshih/seglink [TF]
- https://github.com/dengdan/seglink [TF]
Deep Direct Regression for Multi-Oriented Scene Text Detection [ICCV 2017] [paper]
Cascaded Segmentation-Detection Networks for Word-Level Text Spotting [ArXiv] [paper]
EAST: An Efficient and Accurate Scene Text Detector [CVPR 2017] [paper]
- https://github.com/argman/EAST [TF]
- https://github.com/kurapan/EAST [Keras]
WordFence: Text Detection in Natural Images with Border Awareness [ICIP 2017] [paper]
R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection [ArXiv] [paper]
- https://github.com/DetectionTeamUCAS/R2CNN_Faster-RCNN_Tensorflow [TF]
- https://github.com/beacandler/R2CNN [Caffe]
WordSup: Exploiting Word Annotations for Character based Text Detection [ICCV 2017] [paper]
Single Shot Text Detector With Regional Attention [ICCV 2017] [paper]
- https://github.com/BestSonny/SSTD [Caffe]
- https://github.com/HotaekHan/SSTDNet [PyTorch]
Fused Text Segmentation Networks for Multi-oriented Scene Text Detection [ArXiv] [paper]
Deep Residual Text Detection Network for Scene Text [ICDAR 2017] [paper]
Feature Enhancement Network: A Refined Scene Text Detector [AAAI 2018] [paper]
ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene [ArXiv] [paper]
Self-organized Text Detection with Minimal Post-processing via Border Learning [ICCV 2017] [paper]
- https://gitlab.com/rex-yue-wu/ISI-PPT-Text-Detector [Keras]

2018

PixelLink: Detecting Scene Text via Instance Segmentation [AAAI 2018] [paper]
- https://github.com/ZJULearning/pixel_link [TF]
- https://github.com/BowieHsu/tensorflow_ocr [TF]
FOTS: Fast Oriented Text Spotting With a Unified Network [CVPR 2018] [paper]
TextBoxes++: A Single-Shot Oriented Scene Text Detector [TIP 2018] [paper]
- https://github.com/MhLiao/TextBoxes_plusplus [Caffe]
Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation [CVPR 2018] [paper]
An end-to-end TextSpotter with Explicit Alignment and Attention [CVPR 2018] [paper]
- https://github.com/tonghe90/textspotter [Caffe]
Rotation-Sensitive Regression for Oriented Scene Text Detection [CVPR 2018] [paper]
- https://github.com/MhLiao/RRD [Caffe]
Detecting multi-oriented text with corner-based region proposals [Neurocomputing 2019] [paper]
- https://github.com/xhzdeng/crpn [Caffe]
An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches [ArXiv] [paper]
IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection [IJCAI 2018] [paper]
- https://github.com/xieyufei1993/InceptText-Tensorflow [TF]
Shape Robust Text Detection with Progressive Scale Expansion Network [CVPR 2019] [paper] [paper v2]
- https://github.com/whai362/PSENet [PyTorch]
- https://github.com/liuheng92/tensorflow_PSENet [TF]
TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes [ECCV 2018] [paper]
- https://github.com/princewang1994/TextSnake.pytorch [PyTorch]
Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes [ECCV 2018] [paper]
- https://github.com/lvpengyuan/masktextspotter.caffe2 [Caffe2]
Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping [ECCV 2018] [paper]
A New Anchor-Labeling Method For Oriented Text Detection Using Dense Detection Framework [SPL 2018] [paper]
An Efficient System for Hazy Scene Text Detection using a Deep CNN and Patch-NMS [ICPR 2018] [paper]
Scene Text Detection with Supervised Pyramid Context Network [AAAI 2019] [paper]
Pixel-Anchor: A Fast Oriented Scene Text Detector with Combined Networks [ArXiv] [paper]
Mask R-CNN with Pyramid Attention Network for Scene Text Detection [WACV 2019] [paper]
TextMountain: Accurate Scene Text Detection via Instance Segmentation [ArXiv] [paper]
TextField: Learning A Deep Direction Field for Irregular Scene Text Detection [ArXiv] [paper]
TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network [ACCV 2018] [paper]

2019

MSR: Multi-Scale Shape Regression for Scene Text Detection [IJCAI 2019] [paper]
Scene Text Detection with Inception Text Proposal Generation Module [ICMLC 2019] [paper]
Towards Robust Curve Text Detection with Conditional Spatial Expansion [CVPR 2019] [paper]
Curve Text Detection with Local Segmentation Network and Curve Connection [ArXiv] [paper]
Pyramid Mask Text Detector [ArXiv] [paper]
Tightness-aware Evaluation Protocol for Scene Text Detection [CVPR 2019] [paper]
Character Region Awareness for Text Detection [CVPR 2019] [paper]
Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes [CVPR 2019] [paper]
TextCohesion: Detecting Text for Arbitrary Shapes [ArXiv] [paper]
Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation [CVPR 2019] [paper]
Learning Shape-Aware Embedding for Scene Text Detection [CVPR 2019] [paper]
A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning [ACMMM 2019] [paper]
Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network [ICCV 2019] [paper]
Towards Unconstrained End-to-End Text Spotting [ICCV 2019] [paper]
TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting [paper]
Convolutional Character Networks [ICCV 2019] [paper]

Weakly supervised Scene Text Detection & Recognition

2017

Attention-Based Extraction of Structured Information from Street View Imagery [ICDAR 2017] [paper]
WeText: Scene Text Detection under Weak Supervision [ICCV 2017] [paper]
SEE: Towards Semi-Supervised End-to-End Scene Text Recognition [AAAI 2018] [paper]
- https://github.com/Bartzi/see [Chainer]

Scene Text Recognition

2014

Deep Structured Output Learning for Unconstrained Text Recognition [ICLR 2015] [paper]
- https://github.com/AlexandreSev/Structured_Data [TF]
Reading text in the wild with convolutional neural networks [IJCV 2016] [paper]
- https://github.com/mathDR/reading-text-in-the-wild [Keras]

2015

Reading Scene Text in Deep Convolutional Sequences [AAAI 2016] [paper]
An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition [TPAMI 2017] [paper]
- https://github.com/bgshih/crnn [Torch]
- https://github.com/weinman/cnn_lstm_ctc_ocr [TF]
- https://github.com/watsonyanghx/CNN_LSTM_CTC_Tensorflow [TF]
- https://github.com/MaybeShewill-CV/CRNN_Tensorflow [TF]
- https://github.com/meijieru/crnn.pytorch [PyTorch]
- https://github.com/kurapan/CRNN [Keras]

2016

Recursive Recurrent Nets with Attention Modeling for OCR in the Wild [CVPR 2016] [paper]
Robust scene text recognition with automatic rectification [CVPR 2016] [paper]
- https://github.com/WarBean/tps_stn_pytorch [PyTorch]
- https://github.com/marvis/ocr_attention [PyTorch]
CNN-N-Gram for Handwriting Word Recognition [CVPR 2016] [paper]
STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition [BMVC 2016] [paper]

2017

STN-OCR: A single Neural Network for Text Detection and Text Recognition [ArXiv] [paper]
- https://github.com/Bartzi/stn-ocr [MXNet]
Learning to Read Irregular Text with Attention Mechanisms [IJCAI 2017] [paper]
Scene Text Recognition with Sliding Convolutional Character Models [ArXiv] [paper]
Focusing Attention: Towards Accurate Text Recognition in Natural Images [ICCV 2017] [paper]
AON: Towards Arbitrarily-Oriented Text Recognition [CVPR 2018] [paper]
- https://github.com/huizhang0110/AON [TF]
Gated Recurrent Convolution Neural Network for OCR [NIPS 2017] [paper]
- https://github.com/Jianfeng1991/GRCNN-for-OCR [Torch]

2018

Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition [AAAI 2018] [paper]
SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoder-decoder Network [AAAI 2018] [paper]
Edit Probability for Scene Text Recognition [CVPR 2018] [paper]
ASTER: An Attentional Scene Text Recognizer with Flexible Rectification [TPAMI 2018] [paper]
- https://github.com/bgshih/aster [TF]
Synthetically Supervised Feature Learning for Scene Text Recognition [ECCV 2018] [paper]
Scene Text Recognition from Two-Dimensional Perspective [AAAI 2019] [paper]
ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification [CVPR 2019] [paper]

2019

A Multi-Object Rectified Attention Network for Scene Text Recognition [Pattern Recognition] [paper]
- https://github.com/Canjie-Luo/MORAN_v2 [PyTorch]
A Simple and Robust Convolutional-Attention Network for Irregular Text Recognition [paper]
Aggregation Cross-Entropy for Sequence Recognition [CVPR 2019][paper]
- https://github.com/summerlvsong/Aggregation-Cross-Entropy [PyTorch]
Sequence-to-Sequence Domain Adaptation Network for Robust Text Image Recognition [CVPR 2019][paper]
2D Attentional Irregular Scene Text Recognizer [ArXiv] [paper]
Deep Neural Network for Semantic-based Text Recognition in Images [ArXiv] [paper]
Symmetry-constrained Rectification Network for Scene Text Recognition [ICCV 2019] [paper]
Rethinking Irregular Scene Text Recognition (ICDAR 2019-ArT) [paper]
- https://github.com/Jyouhou/ICDAR2019-ArT-Recognition-Alchemy [PyTorch]
Focus-Enhanced Scene Text Recognition with Deformable Convolutions [ArXiv] [paper]
- https://github.com/Alpaca07/dtr [PyTorch]
Adaptive Embedding Gate for Attention-Based Scene Text Recognition [ArXiv] [paper]

Script Identification

Other scene text related papers

2016

Synthetic Data for Text Localisation in Natural Images [CVPR 2016] [paper]
- https://github.com/ankush-me/SynthText

2019

Scene Text Synthesis for Efficient and Effective Deep Network Training [ArXiv] [paper]

Scene text survey

2018

Scene Text Detection and Recognition: The Deep Learning Era [ArXiv] [paper]

2019

Scene text detection and recognition with advances in deep learning: a survey [IJDAR 2019] [paper]

Dataset

PowerPoint Text Detection and Recognition Dataset 2017

CORD 2020

Over 1k A Consolidated Receipt Dataset for Post-OCR Parsing
Task:text recognition

SROIE 2019

Over 1062 images from Scanned receipts
Task:text location and recognition

COCO-Text (ComputerVision Group, Cornell) 2016

63,686images, 173,589 text instances, 3 fine-grained text attributes.
Task:text location and recognition

COCO-Text API

Synthetic Data for Text Localisation in Natural Image (VGG)2016

800k thousand images
8 million synthetic word instances
download

Synthetic Word Dataset (Oxford, VGG) 2014

9million images covering 90k English words
Task:text recognition, segmentation
download

IIIT 5K-Words 2012

5000images from Scene Texts and born-digital (2k training and 3k testing images)
Eachimage is a cropped word image of scene text with case-insensitive labels
Task:text recognition
download

StanfordSynth(Stanford, AI Group) 2012

Small single-character images of 62 characters (0-9, a-z, A-Z)
Task:text recognition
download

MSRA Text Detection 500 Database(MSRA-TD500) 2012

500 natural images(resolutions of the images vary from 1296x864 to 1920x1280)
Chinese,English or mixture of both
Task:text detection

Street View Text (SVT) 2010

350 high resolution images (average size 1260 × 860) (100 images for training and 250 images for testing)
Only word level bounding boxes are provided with case-insensitive labels
Task:text location

KAIST Scene_Text Database 2010

3000 images of indoor and outdoor scenes containing text
Korean,English (Number), and Mixed (Korean + English + Number)
Task:text location, segmentation and recognition

Chars74k 2009

Over 74K images from natural images, as well as a set of synthetically generatedcharacters
Smallsingle-character images of 62 characters (0-9, a-z, A-Z)
Task:text recognition
ICDAR Benchmark Datasets

Dataset	Discription	Competition Paper
ICDAR 2019	training and testing images	`paper`
ICDAR 2017	42618 training images and 9837 testing images	`paper`
ICDAR 2015	1000 training images and 500 testing images	`paper`
ICDAR 2013	229 training images and 233 testing images	`paper`
ICDAR 2011	229 training images and 255 testing images	`paper`
ICDAR 2005	1001 training images and 489 testing images	`paper`
ICDAR 2003	181 training images and 251 testing images(word level and character level)	`paper`

Blogs

Online Service

Name	Description
Online OCR	API，Free
Free OCR	API，Free
New OCR	API，Free
ABBYY FineReader Online	nonAPI，free

Open Resources Code

本项目基于yolo3 与crnn 实现中文自然场景文字检测及识别 [code]
超轻量级中文ocr，支持竖排文字识别, 支持ncnn推理 , psenet(8.5M) + crnn(6.3M) + anglenet(1.5M) 总模型仅17M [code]
Tesseract c++ based tools for documents analysis and OCR [code]
Ocropy: Python-based tools for document analysis and OCR https://github.com/tmbdev/ocropy
CLSTM A small implementation of LSTM networks,focused on OCR https://github.com/tmbdev/clstm
Convolutional Recurrent Neural Network Torch7 https://github.com/bgshih/crnn
Attention-OCR Visual Attention based OCR https://github.com/da03/Attention-OCR
Umaru: An OCR-system based on torch using the technique of LSTM/GRU-RNN, CTC and referred to the works of rnnlib and clstm https://github.com/edward-zhu/umaru
AKSHAYUBHAT/DeepVideoAnalytics (CTPN+CRNN) code
ankush-me/SynthText code
JarveeLee/SynthText_Chinese_version code

Hand Writing Recognition

[2016-arXiv]Drawingand Recognizing Chinese Characters with Recurrent Neural Network https://arxiv.org/abs/1606.06539
Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition https://arxiv.org/abs/1610.02616
Stroke Sequence-Dependent Deep Convolutional Neural Network for Online Handwritten Chinese Character Recognition https://arxiv.org/abs/1610.04057
High Performance Offline Handwritten Chinese Character Recognition Using GoogLeNet and Directional Feature Maps http://arxiv.org/abs/1505.04925">
DeepHCCR:Offline Handwritten Chinese Character Recognition based on GoogLeNet and AlexNet (With CaffeModel) https://github.com/chongyangtao/DeepHCCR">
Scan,Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTMAttention http://arxiv.org/abs/1604.03286
MLPaint:the Real-Time Handwritten Digit Recognizer http://blog.mldb.ai/blog/posts/2016/09/mlpaint/
caffe-ocr: OCR with caffe deep learning framework https://github.com/pannous/caffe-ocr

Licence Tag Recognition

ReadingCar License Plates Using Deep Convolutional Neural Networks and LSTMs
Numberplate recognition with Tensorflow http://matthewearl.github.io/2016/05/06/cnn-anpr/
end-to-end-for-plate-recognition href="https://github.com/szad670401/end-to-end-for-chinese-plate-recognitionbhttp://rnd.azoft.com/applying-ocr-technology-receipt-recognition/

amit-code/ocr_documentation

Scene Text

Scene Text Detection (including methods for end-to-end detection and recognition)

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

Weakly supervised Scene Text Detection & Recognition

2017

Scene Text Recognition

2014

2015

2016

2017

2018

2019

Script Identification

Other scene text related papers

2016

2019

Scene text survey

2018

2019

Dataset

Blogs

Online Service

Open Resources Code

Hand Writing Recognition

Licence Tag Recognition