/PMTD

Pyramid Mask Text Detector designed by SenseTime Video Intelligence Research team.

Primary LanguagePython

PMTD: Pyramid Mask Text Detector

This project hosts the inference code for implementing the PMTD algorithm for text detection, as presented in our paper:

Pyramid Mask Text Detector;
Liu Jingchao, Liu Xuebo, Sheng Jie, Liang Ding, Li Xin and Liu Qingjie;
arXiv preprint arXiv:1903.11800 (2019).

The full paper is available at: https://arxiv.org/abs/1903.11800.

Installation

Check INSTALL.md for installation instructions.

Trained model

We provide trained model on ICDAR 2017 MLT dataset here and ICDAR 2015 dataset here for downloading. Note that the result is slightly different from we reported in the paper, because PMTD is based on a private codebase, we reimplement inference code based on maskrcnn-benchmark.

ICDAR 2017

Method Precision Recall F-measure
This project 85.13% 72.85% 78.51%
Paper reported 85.15% 72.77% 78.48%

ICDAR 2015

Method Precision Recall F-measure
This project 87.48% 91.26% 89.33%
Paper reported 87.43% 91.30% 89.33%

A quick demo

cd PROJECT_ROOT
python demo/PMTD_demo.py \
--image_path=datasets/icdar2017mlt/ch8_validation_images/img_1.jpg \
--model_path=models/PMTD_ICDAR2017MLT.pth

Perform testing on ICDAR 2017 MLT dataset

Prepare dataset

We recommend to symlink ICDAR 2017 MLT dataset to datasets/ as follows

# eg: ~/Projects/PMTD
cd PROJECT_ROOT

mkdir -p datasets/icdar2017mlt
cd datasets/icdar2017mlt

# symlink for images and annotations
ln -s /path_to_icdar2017mlt_dataset/ch8_test_images

Generate coco label for dataset

# ${PWD} = datasets/icdar2017mlt
mkdir annotations
cd PROJECT_ROOT
python demo/utils/generate_icdar2017.py
# label will output to PROJECT_ROOT/datasets/icdar2017mlt/annotations/test_coco.json

Test images

In the test stage, we use one GPU of TITANX 11G with a batch size 4. When encountering the out-of-memory (OOM) error, you may need to modify TEST.IMS_PER_BATCH in configs/e2e_PMTD_R_50_FPN_1x_test.yaml.

# the download model should place in the path: models/PMTD_ICDAR2017MLT.pth
python tools/test_net.py --config=configs/e2e_PMTD_R_50_FPN_1x_ICDAR2017MLT_test.yaml
# results will output to PROJECT_ROOT/inference/icdar_2017_mlt_test/
# - bbox.json // when using coco evaluation criterion
# - segm.json // when using coco evaluation criterion
# - dataset.pth
# - predictions.pth
# - results_{scale}.pth, in default setting, scale=1600

Convert results to ICDAR 2017 submission format

python demo/utils/convert_results_to_icdar.py
# results will output to PROJECT_ROOT/inference/icdar_2017_mlt_test/
# - icdar.zip

submit icdar.zip to ICDAR 2017 MLT

Citations

Please consider citing our paper in your publications if this project helps your research. BibTeX reference is as follows.

@article{liu2019pyramid,
  title={Pyramid Mask Text Detector},
  author={Liu, Jingchao and Liu, Xuebo and Sheng, Jie and Liang, Ding and Li, Xin and Liu, Qingjie},
  journal={arXiv preprint arXiv:1903.11800},
  year={2019}
}

Contributors

License

Maskrcnn-benchmark is released under the MIT license. PMTD is released under the Apache 2.0 license.