Code for generating the CurvedSynth dataset used in our paper: Alchemy: Techniques for Rectification Based Irregular Scene Text Recognition (To be uploaded to Arxiv).
Rectification based scene text recognizers trained on this dataset can achieve an improvement by more than 10% on Total-Text. For more details, please read our papers.
Below is the original README file for the original SynthText. The generation of CurvedSynth is the same as SynthText. We only altered the text rendering module, added multithread data generating and made some visualization techniques.
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.
Synthetic Scene-Text Image Samples
The library is written in Python. The main dependencies are:
pygame, opencv (cv2), PIL (Image), numpy, matplotlib, h5py, scipy
source run_gen.sh
All parameters:
- begin_index: the index of the first background image.
- end_index: the index of the last background image.
- max_time: the time limit to generate a image.
- gen_data_path: the directory to store dset.h5, models, fonts and newsgroup.
- bg_data_path: the directory to store imnames.cp,bg_img.tar.gz,depth.h5,seg.h5.
- jobs: the number of process.
- output_path: the absolute path to stored generated images.
- instance_per_image: the number each background images used to genearte synthetic images.
In the last step, only binary files are generated. To generate png files, you should run:
source to_image.sh
Add new fonts to the generater, since the official project only provide three sample fonts. ** You should first add ttf files into {font_dir}/newfonts. **
source add_new_fonts.sh
All parameters:
- font_dir: the directory to store fonts.
- data_dir: the directory to store dset.h5, models, fonts and newsgroup. Same as the gen_data_path.
You must download these files before the generation.
- here This data file includes:
- **dset.h5**: This is a sample h5 file which contains a set of 5 images along with their depth and segmentation information. Note, this is just given as an example; you are encouraged to add more images (along with their depth and segmentation information) to this database for your own use.
- **data/fonts**: three sample fonts (do not update fontlist.txt, just add ttf fonts to newfonts directory, see text_utils.py for detail information.).
- **data/newsgroup**: Text-source (from the News Group dataset). This can be subsituted with any text file. Look inside `text_utils.py` to see how the text inside this file is used by the renderer.
- **data/models/colors_new.cp**: Color-model (foreground/background text color model), learnt from the IIIT-5K word dataset.
- **data/models**: Other cPickle files (**char\_freq.cp**: frequency of each character in the text dataset; **font\_px2pt.cp**: conversion from pt to px for various fonts).
- The 8,000 background images used in the paper, along with their segmentation and depth masks:
http://www.robots.ox.ac.uk/~vgg/data/scenetext/preproc/<filename>
, where, can be: -imnames.cp
[180K]: names of filtered files, i.e., those files which do not contain text -bg_img.tar.gz
[8.9G]: compressed image files (more than 8000, so only use the filtered ones in imnames.cp) -depth.h5
[15G]: depth maps -seg.h5
[6.9G]: segmentation maps
Note: I do not own the copyright to these images.
A dataset with approximately 800000 synthetic scene-text images generated with this code can be found here.
Segmentation and depth-maps are required to use new images as background. Sample scripts for obtaining these are available here.
predict_depth.m
MATLAB script to regress a depth mask for a given RGB image; uses the network of Liu etal. However, more recent works (e.g., this) might give better results.run_ucm.m
andfloodFill.py
for getting segmentation masks using gPb-UCM.
For an explanation of the fields in dset.h5
(e.g.: seg
,area
,label
), please check this comment.
@JarveeLee has modified the pipeline for generating samples with Chinese text here.
Please refer to the paper for more information, or contact me (email address in the paper).