SCUT-FORU: Flickr OCR Universal Database
Flickr OCR Universal Database (abbreviated as FORU and pronounced like for you) is collected from Flickr website https://www.flickr.com/ and now released by Human Computer Intelligent Interaction Lab of South China University of Technology. The database can be downloaded by https://www.dropbox.com/s/06wfn5ugt5v3djs/SCUT_FORU_DB_Release.rar?dl=0.
1 Database Organization
FORU contains two parts, which are Chinese2k and English2k dataset, respectively.
(a) English2k dataset
English2k dataset is consists of two subsets, which one subset is suitable for character detection and the other one is suitable for word detection. The character detection subset has 813 training images and 349 testing images. The detailed annotations are presented in Table 1. On the other hand, the word detection subset has 1200 training images and 515 testing images. The detailed annotations are presented in Table 2.
Table 1 Character Level Bounding boxes
Training Bounding box Instances | Testing Bounding box Instances | Total Bounding box Instances
English2k 14888 | 6506 | 21394
Table 2 Word Level Bounding boxes
Training Bounding box Instances | Testing Bounding box Instances | Total Bounding box Instances
English2k 3840 | 1581 | 5421
Table 3 Annotations of Chinese2k dataset
Training Bounding box Instances | Testing Bounding box Instances | Total Bounding box Instances
Chinese2k 23768 | 4626 | 28394
(b) Chinese2k dataset
The Chinese2k dataset has 1861 training images and 355 testing images. The detailed annotation is presented in Table 3.
2 Format of ground truth files
Each image in the database corresponds to a ground truth file, in which each line records the location and label of one character/word. The format of the ground truth files is illustrated as {x, y, w, h, label}. {x, y} denotes the top-left coordinate of the annotated bounding box, and {w, h} denotes the size of the annotated bounding box.
3 Database Usage
The samples of SCUT-FORU dataset have a great diversity. The scenes include streets, buildings, shops, offices, restaurants, railway stations, subway, etc. The texts include traffic signs, road signs, book covers, outdoor advertising, signs, and so on. Various light conditions are also considered, such as sunny and cloudy, day and night, and so on. It is worth noting that most text directions are horizontal. This database should be used for scene text detection and recognition. Specifically, it is available in the field of (A) cropped character recognition; (B) cropped word recognition and (C) full image word detection and recognition.
4 Contact
Please consider to cite our paper Shuye Zhang, Mude Lin, Tianshui Chen, Lianwen Jin*, Liang Lin, “CHARACTER PROPOSAL NETWORK FOR ROBUST TEXT DETECTION” when you use our database. Please feel free to modify the dataset when it is needed. For any questions about this database please contact shuye.cheung@gmail.com or lianwen.jin@gmail.com.