/CRNN_Tensorflow

Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition

Primary LanguagePython

CRNN_Tensorflow

Use tensorflow to implement a Deep Neural Network for scene text recognition mainly based on the paper "An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition".You can refer to their paper for details http://arxiv.org/abs/1507.05717. Thanks for the author Baoguang Shi.
This model consists of a CNN stage, RNN stage and CTC loss for scene text recognition task.

Installation

This software has mostly been tested on Ubuntu 16.04(x64) using python3.5 and tensorflow 1.3.0 with cuda-8.0, cudnn-6.0 and a GTX-1070 GPU. Other versions of tensorflow have not been tested but might work properly above version 1.0.

The following methods are provided to install dependencies:

Docker

There are Dockerfiles inside the folder docker. Follow the instructions inside docker/README.md to build the images.

Conda

You can create a conda environment with the required dependencies using:

conda env create -f crnntf-env.yml

Pip

Required packages may be installed with

pip3 install -r requirements.txt

Test model

In this repo I uploaded a model trained on a subset of the Synth 90k. During data preparation process the dataset is converted into a tensorflow records which you can find in the data folder. You can test the trained model on the converted dataset by

python tools/test_shadownet.py --dataset_dir data/ --weights_path model/shadownet/shadownet_2017-09-29-19-16-33.ckpt-39999

Expected output is
Test output If you want to test a single image you can do it by

python tools/demo_shadownet.py --image_path data/test_images/test_01.jpg --weights_path model/shadownet/shadownet_2017-09-29-19-16-33.ckpt-39999

Example image_01 is
Example image1
Expected output_01 is
Example image1 output
Example image_02 is
Example image2
Expected output_02 is
Example image2 output Example image_03 is
Example image3
Expected output_03 is
Example image3 output Example image_04 is
Example image4
Expected output_04 is
Example image4 output

Train your own model

Data Preparation

Firstly you need to store all your image data in a root folder then you need to supply a txt file named sample.txt to specify the relative path to the image data dir and it's corresponding text label. For example

path/1/2/373_coley_14845.jpg coley
path/17/5/176_Nevadans_51437.jpg nevadans

Secondly you are supposed to convert your dataset into tensorflow records which can be done by

python tools/write_text_features --dataset_dir path/to/your/dataset --save_dir path/to/tfrecords_dir

All your training image will be scaled into (32, 100, 3) the dataset will be divided into train, test, validation set and you can change the parameter to control the ratio of them.

Train model

The whole training epoches are 40000 in my experiment. I trained the model with a batch size 32, initialized learning rate is 0.1 and decrease by multiply 0.1 every 10000 epochs. For more training parameters information you can check the global_configuration/config.py for details. To train your own model by

python tools/train_shadownet.py --dataset_dir path/to/your/tfrecords

You can also continue the training process from the snapshot by

python tools/train_shadownet.py --dataset_dir path/to/your/tfrecords --weights_path path/to/your/last/checkpoint

After several times of iteration you can check the log file in logs folder you are supposed to see the following contenent Training log The seq distance is computed by calculating the distance between two saparse tensor so the lower the accuracy value is the better the model performs.The train accuracy is computed by calculating the character-wise precision between the prediction and the ground truth so the higher the better the model performs.

During my experiment the loss drops as follows
Training loss The distance between the ground truth and the prediction drops as follows
Sequence distance

Experiment

The accuracy during training process rises as follows
Training accuracy

TODO

The model is trained on a subet of Synth 90k. So i will train a new model on the whold dataset to get a more robust model.The crnn model needs large of training data to get a rubust model.