custom from repo: https://github.com/eragonruan/text-detection-ctpn with Tesseract text recognize for each detected box
nms and bbox utils are written in cython, hence you have to build the library first.
cd utils/bbox
chmod +x make.sh
./make.sh
It will generate a nms.so and a bbox.so in current folder.
- download the ckpt file from googl drive or baidu yun
- put "checkpoints_mlt/" in "text-detection-ctpn/"
- put your images in "data/demo", output image and text in "data/res", and run demo in the root
python main/demo.py
- struct directory:
- text recognize with Tesseract:
- First, download the pre-trained model of VGG net and put it in data/vgg_16.ckpt. you can download it from tensorflow/models
- Second, download the dataset we prepared from google drive or baidu yun. put the downloaded data in data/dataset/mlt, then start the training.
- Also, you can prepare your own dataset according to the following steps.
- Modify the DATA_FOLDER and OUTPUT in utils/prepare/split_label.py according to your dataset. And run split_label.py in the root
python ./utils/prepare/split_label.py
- it will generate the prepared data in data/dataset/
- The input file format demo of split_label.py can be found in gt_img_859.txt. And the output file of split_label.py is img_859.txt. A demo image of the prepared data is shown below.
Simplely run
python main/train.py
- The model provided in checkpoints_mlt is trained on GTX1070 for 50k iters. It takes about 0.25s per iter. So it will takes about 3.5 hours to finished 50k iterations.
NOTICE:
all the photos used below are collected from the internet. If it affects you, please contact me to delete them.