
Total Text Dataset - ICDAR 2017. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Primary LanguageMatlab


Released on October 27, 2017

Updated on November 04, 2017 (Text level groundtruth)

Updated on April 03, 2018 (Pixel level groundtruth)


In order to facilitate a new text detection research, we introduce the Total-Text dataset (ICDAR2017 paper) (presentation slides), which is more comprehensive than the existing text datasets. The Total-Text consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.


If you find this dataset useful for your research, please cite

  author    = {Chee Kheng Ch’ng and
               Chee Seng Chan},
  title     = {Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition},
  booktitle = {14th IAPR International Conference on Document Analysis and Recognition {ICDAR}},
  pages     = {},
  year      = {2017},
  doi       = {},


Suggestions and opinions of this dataset (both positive and negative) are greatly welcome. Please contact the authors by sending email to chngcheekheng at gmail.comor cs.chan at um.edu.my.


The Total-Text database is free to the academic community for research purpose usage only.

Copyright 2018, Center of Image and Signal Processing, Faculty of Computer Science and Information Technology, University of Malaya.