text-detection-ctpn

text detection mainly based on ctpn (connectionist text proposal network). It is implemented in tensorflow. I use id card detect as an example to demonstrate the results, but it should be noticing that this model can be used in almost every horizontal scene text detection task. The origin paper can be found here. Also, the origin repo in caffe can be found in here. This repo is mainly based on faster rcnn framework, so there remains tons of useless code. I'm still working on it. For more detail about the paper and code, see this blog

setup

requirements: tensorflow1.3, cython0.24, opencv-python, easydict,(recommend to install Anaconda)
build the library

cd lib/utils
chmod +x make.sh
./make.sh

parameters

there are some parameters you may need to modify according to your requirement, you can find them in ctpn/text.yml

USE_GPU_NMS # whether to use nms implemented in cuda,if you do not have a gpu device,follow here to setup
DETECT_MODE # H represents horizontal mode, O represents oriented mode, default is H

demo

put your images in data/demo, the results will be saved in data/results, and run demo in the root

python ./ctpn/demo.py

training

prepare data

First, download the pre-trained model of VGG net and put it in data/pretrain/VGG_imagenet.npy. you can download it from google drive or baidu yun.
Second, prepare the training data as referred in paper, or you can download the data I prepared from previous link. Or you can prepare your own data according to the following steps.
Modify the path and gt_path in prepare_training_data/split_label.py according to your dataset. And run

cd prepare_training_data
python split_label.py

it will generate the prepared data in current folder, and then run

python ToVoc.py

to convert the prepared training data into voc format. It will generate a folder named TEXTVOC. move this folder to data/ and then run

cd ../data
ln -s TEXTVOC VOCdevkit2007

train

Simplely run

python ./ctpn/train_net.py

you can modify some hyper parameters in ctpn/text.yml, or just used the parameters I set.
The model I provided in checkpoints is trained on GTX1070 for 50k iters.
If you are using cuda nms, it takes about 0.2s per iter. So it will takes about 2.5 hours to finished 50k iterations.

roadmap

some results

NOTICE: all the photos used below are collected from the internet. If it affects you, please contact me to delete them.

comparison of horizontal and oriented text connector

oriented text connector has been implemented, i's working, but still need futher improvement.
left figure is the result for DETECT_MODE H, right figure for DETECT_MODE O

Ruochen0715/text-detection-ctpn