This project is a pytorch implementation of our paper "A quadrilateral scene text detector with two-stage network architecture"(https://www.sciencedirect.com/science/article/abs/pii/S0031320320300364).
If you have any questions, contact us with wyt@pku.edu.cn, bahuangliuhe@pku.edu.cn.
- Python 3.7
- Pytorch 0.3.0
- CUDA 8.0 or higher
- ICDAR2015(or other dataset): Please downloading the data and creat softlinks in the folder data/.
We used two pretrained models in our experiments, VGG and ResNet101. You can download these two models from:
Download them and put them into the data/pretrained_model/.
NOTE. We compare the pretrained models from Pytorch and Caffe, and surprisingly find Caffe pretrained models have slightly better performance than Pytorch pretrained. We would suggest to use Caffe pretrained models from the above link to reproduce our results.
If you want to use pytorch pre-trained models, please remember to transpose images from BGR to RGB, and also use the same data transformer (minus mean and normalize) as used in pretrained model.
As pointed out by ruotianluo/pytorch-faster-rcnn, choose the right -arch
to compile the cuda code:
GPU model | Architecture |
---|---|
TitanX (Maxwell/Pascal) | sm_52 |
GTX 960M | sm_50 |
GTX 1080 (Ti) | sm_61 |
Grid K520 (AWS g2.2xlarge) | sm_30 |
Tesla K80 (AWS p2.xlarge) | sm_37 |
More details about setting the architecture can be found here or here
Install all the python dependencies using pip:
pip install -r requirements.txt
Compile the cuda dependencies using following simple commands:
cd lib
sh make.sh
It will compile all the modules you need, including NMS, ROI_Pooing, ROI_Align and ROI_Crop. The default version is compiled with Python 2.7, please compile by yourself if you are using a different python version.
Before training, set the right directory to save and load the trained models. Change the arguments "save_dir" and "load_dir" in trainval_net.py and test_net.py to adapt to your environment.
To train a faster R-CNN model with vgg16 on pascal_voc, simply run:
CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net.py \
--dataset icdar --net res101 \
--bs $BATCH_SIZE --nw $WORKER_NUMBER \
--lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
--cuda
where 'bs' is the batch size with default 1. Alternatively, to train with resnet101 on pascal_voc, simple run:
CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net.py \
--dataset icdar --net res101 \
--bs $BATCH_SIZE --nw $WORKER_NUMBER \
--lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
--cuda
Above, BATCH_SIZE and WORKER_NUMBER can be set adaptively according to your GPU memory size. On Titan Xp with 12G memory, it can be up to 4.
If you have multiple (say 8) Titan Xp GPUs, then just use them all! Try:
python trainval_net.py --dataset icdar --net res101 \
--bs 24 --nw 8 \
--lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
--cuda --mGPUs
If you want to evlauate the detection performance of a pre-trained vgg16 model on pascal_voc test set, simply run
python test_net.py --dataset icdar --net res101 \
--checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \
--cuda
Specify the specific model session, chechepoch and checkpoint, e.g., SESSION=1, EPOCH=6, CHECKPOINT=416.
@article{wang2020quadrilateral,
title={A quadrilateral scene text detector with two-stage network architecture},
author={Wang, Siwei and Liu, Yudong and He, Zheqi and Wang, Yongtao and Tang, Zhi},
journal={Pattern Recognition},
volume={102},
pages={107230},
year={2020},
publisher={Elsevier}
}