/awesome-deep-text-detection-recognition

A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

Apache License 2.0Apache-2.0

awesome-deep-text-detection-recognition

A curated list of awesome deep learning based papers on text detection and recognition.

Text Detection

  • Papers are sorted by published date.
  • IC is shorts for ICDAR.
  • Score is F1-score for localization task.
    • (L) stands for score in leader-board.
    • If the reported score in leader-board is somewhat different from the paper, (L) is provided.
  • *CODE means official code and CODE(M) means that traiend model is provided.
Conf. Date Title IC13 IC15 Resources
'14-ECCV 14/10/07 Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees
15-CVPR 15/06/01 Symmetry-based text line detection in natural scenes 0.8043 PRJ
CODE
'16-TIP 15/10/12 Text-Attentional Convolutional Neural Networks for Scene Text Detection 0.8165
'15-ICCV 15/12/13 Text Flow : A Unified Text Detection System in Natural Scene Images 0.8025
'16-arXiv 16/03/31 Accurate Text Localization in Natural Image with Cascaded Convolutional TextNetwork 0.86
'16-CVPR 16/04/14 Multi-Oriented Text Detection with Fully Convolutional Networks 0.83 0.54 *TORCH(M)
'16-CVPR 16/04/22 Synthetic Data for Text Localisation in Natural Images 0.847
(L)0.8359
CODE
DB
'16-arXiv 16/06/29 Scene Text Detection Via Holistic, Multi-Channel Prediction 0.8433 0.6477
'16-ECCV 16/09/12 Detecting Text in Natural Image with Connectionist Text Proposal Network 0.8215 0.6085 *CAFFE(M)
CAFFE
TF(M)
DEMO
BLOG(CH)
'17-AAAI 16/11/21 TextBoxes: A fast text detector with a single deep neural network 0.85
(L)0.8767
*CAFFE(M)
TF
BLOG(KR)
'18-TM 17/03/03 Arbitrary-Oriented Scene Text Detection via Rotation Proposals 0.9125 0.8020 *CAFFE
'17-CVPR 17/03/04 Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection 0.7064
'17-CVPR 17/03/19 Detecting Oriented Text in Natural Images by Linking Segments 0.853 0.75
(L)0.7636
*TF(M)
TF(M)
SLIDE
VIDEO
'17-arXiv 17/03/24 Deep Direct Regression for Multi-Oriented Scene Text Detection 0.86 0.81
'17-arXiv 17/04/03 Cascaded Segmentation-Detection Networks for Word-Level Text Spotting 0.86 0.71
'17-CVPR 17/04/11 EAST: An Efficient and Accurate Scene Text Detector 0.8072
(L)0.8038
TF(M)
TF
DEMO
VIDEO
'17-ICIP 17/05/15 WordFence: Text Detection in Natural Images with Border Awareness 0.86
'17-arXiv 17/06/30 R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection 0.8773 0.8254 TF(M)
CAFFE(M)
'17-CVPR 17/07/21 Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting In The Wild 0.85 0.63
'17-arXiv 17/08/17 Deep Scene Text Detection with Connected Component Proposals 0.919
'17-ICCV 17/08/22 WordSup: Exploiting Word Annotations for Character based Text Detection 0.9064 0.7816
'17-ICCV 17/09/01 Single Shot Text Detector with Regional Attention 0.8704 0.7691 *CAFFE(M)
PYTORCH
VIDEO
'17-arXiv 17/09/11 Fused Text Segmentation Networks for Multi-oriented Scene Text Detection 0.8414
'17-ICCV 17/10/13 WeText: Scene Text Detection under Weak Supervision 0.869
(L)0.8313
'17-ICCV 17/10/22 Self-organized Text Detection with Minimal Post-processing via Border Learning 0.84 *KERAS(M)
'17-ICDAR 17/11/11 Deep Residual Text Detection Network for Scene Text 0.9117
(L)0.8925
'18-AAAI 17/11/12 Feature Enhancement Network: A Refined Scene Text Detector 0.9161
'17-arXiv 17/11/30 ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene 0.759
'18-AAAI 18/01/04 PixelLink: Detecting Scene Text via Instance Segmentation 0.881 0.8519 *TF(M) TF
'18-CVPR 18/01/05 FOTS: Fast Oriented Text Spotting with a Unified Network 0.925 0.8984 PYTORCH
VIDEO
'18-TIP 18/01/09 TextBoxes++: A Single-Shot Oriented Scene Text Detector 0.88 0.829
(L)0.8475
*CAFFE(M)
'18-CVPR 18/03/09 An end-to-end TextSpotter with Explicit Alighment and Attention 0.9 0.87 *CAFFE(M)
'18-CVPR 18/03/14 Rotation-Sensitive Regression for Oriented Scene Text Detection 0.89 0.838 *CAFFE(M)
'18-arXiv 18/04/08 Detecting Multi-Oriented Text with Corner-based Region Proposals 0.876 0.845 *CAFFE(M)
'18-arXiv 18/04/24 An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches 0.92 0.86
'18-IJCAI 18/05/03 IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection 0.9047
'18-arXiv 18/06/07 Shape Robust Text Detection with Progressive Scale Expansion Network 0.8721 PRJ
'18-ECCV 18/07/04 TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes 0.826
'18-ECCV 18/07/06 Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes 0.917 0.86
'18-ECCV 18/07/10 Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping 0.892
'19-AAAI 18/11/21 Scene Text Detection with Supervised Pyramid Context Network 0.921 0.872

Text Recognition

  • Papers are sorted by published date.
  • IC is shorts for ICDAR.
  • Score is word-accuracy for recognition task.
    • For results on IC03, IC13, and IC15 dataset, papers used different numbers of samples per paper,
      but we did not distinguish between them
  • *CODE means official code and CODE(M) means that trained model is provided.
Conf. Date Title SVT IIIT5k IC03 IC13 Resources
'15-ICLR 14/12/18 Deep structured output learning for unconstrained text recognition 0.717 0.896 0.818 TF
SLIDE
VIDEO
'16-IJCV 15/05/07 Reading text in the wild with convolutional neural networks 0.807 0.933 0.908 KERAS
'16-AAAI 15/06/14 Reading Scene Text in Deep Convolutional Sequences
'17-TPAMI 15/07/21 An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition 0.808 0.782 0.894 0.867 TORCH(M)
TF
TF
TF
PYTORCH
BLOG(KR)
'16-CVPR 16/03/09 Recursive Recurrent Nets with Attention Modeling for OCR in the Wild 0.807 0.784 0.887 0.9
'16-CVPR 16/03/12 Robust scene text recognition with automatic rectification 0.819 0.819 0.901 0.886 PYTORCH
PYTORCH
'16-CVPR 16/06/27 CNN-N-Gram for Handwriting Word Recognition 0.8362 VIDEO
'16-BMVC 16/09/19 STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition 0.836 0.833 0.899 0.891
'17-arXiv 17/07/27 STN-OCR: A single Neural Network for Text Detection and Text Recognition 0.798 0.86 0.903 *MXNET(M)
PRJ
BLOG
'17-IJCAI 17/08/19 Learning to Read Irregular Text with Attention Mechanisms
'17-arXiv 17/09/06 Scene Text Recognition with Sliding Convolutional Character Models 0.765 0.816 0.845 0.852
'17-ICCV 17/09/07 Focusing Attention: Towards Accurate Text Recognition in Natural Images 0.859 0.874 0.942 0.933
'18-CVPR 17/11/12 AON: Towards Arbitrarily-Oriented Text Recognition 0.828 0.87 0.915
'17-NIPS 17/12/04 Gated Recurrent Convolution Neural Network for OCR 0.815 0.808 0.978 *TORCH(M)
'18-AAAI 18/01/04 Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition 0.844 0.836 0.915 0.908
'18-AAAI 18/01/04 SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoder-decoder Network 0.87 0.931 0.929
'18-CVPR 18/05/09 Edit Probability for Scene Text Recognition 0.875 0.883 0.946 0.944
'18-TPAMI 18/06/25 ASTER: An Attentional Scene Text Recognizer with Flexible Rectification 0.936 0.934 0.945 0.918 *TF(M)
'18-ECCV 18/09/08 Synthetically Supervised Feature Learning for Scene Text Recognition 0.871 0.894 0.947 0.94
'19-AAAI 18/09/18 Scene Text Recognition from Two-Dimensional Perspective 0.821 0.92 0.914

End-to-End Text Recognition

  • Papers are sorted by published date.
  • IC is shorts for ICDAR.
  • Score is F1-score for generic task.
  • *CODE means official code and CODE(M) means that trained model is provided.
Conf. Date Title IC03 IC13 IC15 Resources
'12-ICPR 12/11/11 End-to-end text recognition with convolutional neural networks 0.67 *CODE
'14-ECCV 14/09/06 Deep Features for Text Spotting 0.75 PRJ
MATLAB
'15-IJCV 15/05/07 Reading Text in the Wild with Convolutional Neural Networks 0.70 0.77 KERAS
'15-TPAMI 15/10/30 Real-time Lexicon-free Scene Text Localization and Recognition 0.542 0.156
'16-arXiv 16/04/10 TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild 0.6843 0.4718
(L)0.533
*CAFFE(M)
'17-AAAI 16/11/21 TextBoxes: A fast text detector with a single deep neural network 0.84 TF
*CAFFE(M)
BLOG_KR
'17-ICCV 17/07/13 Towards End-to-end Text Spotting with Convolution Recurrent Neural Network 0.8459 VIDEO
'17-ICCV 17/10/22 Deep TextSpotter An End-to-End Trainable Scene Text Localization and Recognition Framework 0.77 0.47 VIDEO
*CAFFE(M)
'18-CVPR 18/01/05 FOTS: Fast Oriented Text Spotting with a Unified Network 0.8477 0.6533 VIDEO
'18-TIP 18/01/09 TextBoxes++: A Single-Shot Oriented Scene Text Detector 0.8465 0.519 *CAFFE(M)
'18-CVPR 18/03/09 An end-to-end TextSpotter with Explicit Alighment and Attention 0.86 0.63 *CAFFE(M)
'18-TPAMI 18/06/25 ASTER: An Attentional Scene Text Recognizer with Flexible Rectification 0.64 *TF(M)
'18-ECCV 18/07/06 Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes 0.865 0.624

Others

  • Papers are sorted by published date.
  • *CODE means official code and CODE(M) means that trained model is provided.
Conf. Date Title Description Resources
'14-NIPS 14/06/09 Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition Dataset PRJ
'17-ECCV 17/02/13 End-to-End Interpretation of the French Street Name Signs Dataset Dataset (FSNS) *TF(M)
'17-arXiv 17/04/11 Attention-based Extraction of Structured Information from Street View Imagery FSNS *TF(M)
TF
TF
LUA
BLOG_KR
'17-CVPR 17/07/21 Unambiguous Text Localization and Retrieval for Cluttered Scenes Text Retrieval
'17-AAAI 17/10/22 Detection and Recognition of Text Embedded in Online Images via Neural Context Models Dataset PRJ
'18-CVPR 17/11/17 Separating Style and Content for Generalized Style Transfer Font Style
'17-arXiv 17/12/06 Detecting Curve Text in the Wild New Dataset and New Solution Dataset (CTW 1500) PRJ
'18-AAAI 17/12/14 SEE: Towards Semi-Supervised End-to-End Scene Text Recognition FSNS PRJ
*CHAINER(M)
'17-CVPR 18/06/07 Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks Document Layout PRJ
'18-CVPR 18/06/19 DocUNet: Document Image Unwarping via A Stacked U-Net Document Dewarping PRJ
'18-CVPR 18/06/19 Document Enhancement using Visibility Detection Document Enhancement PRJ
'18-IJCAI 18/06/22 Multi-Task Handwritten Document Layout Analysis Document Layout
'18-ECCV 18/07/09 Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes Dataset PRJ
'19-AAAI 18/12/03 EnsNet: Ensconce Text in the Wild Text Removal DB

Other lists

Tutorial Materials

Acknowledgment

  • This work is done by OCR team in Clova AI powered by NAVER-LINE. NAVER-LINE is an Asian top internet company and develops Clova, a cloud-based AI-assistant platform.
  • This repository is scheduled to be updated regularly in accordance with schedules of major AI conferences.