/AE_TextSpotter

Primary LanguagePythonApache License 2.0Apache-2.0

AE TextSpotter

Introduction

This is the official implementation of AE TextSpotter, which introduces linguistic information to eliminate the ambiguity in text detection. This code is based on MMDetection v1.0rc1.

demo image

Recommended environment

Python 3.6+
Pytorch 1.1.0
torchvision 0.2.1
pytorch_transformers 1.1.0
mmcv 0.2.13
Polygon3
opencv-python 4.4.0

Install

Please refer to MMDetection v1.0rc1 for installation.

Preparing data

Step1: Downloading dataset from ICDAR 2019 ReCTS.

Step2: The root of "data/ReCTS" should be:

data/ReCTS/
├── train
│   ├── img
│   ├── gt
├── test
│   ├── img

In folder "data/ReCTS/", files "TDA_ReCTS_train_list.txt" and "TDA_ReCTS_val_list.txt" are downloaded from TDA-ReCTS. Other json files can be generated by run "python tools/rects_prepare_data.py".

Step3: Download and unzip bert-base-chinese.zip in the root of this repository.

unzip bert-base-chinese.zip

Training

Step1:

tools/rects_dist_train.sh local_configs/rects_ae_textspotter_r50_1x.py 8

Step2:

tools/rects_dist_train.sh local_configs/rects_ae_textspotter_lm_r50_1x.py 8

Test

TDA-ReCTS

tools/rects_dist_test.sh local_configs/rects_ae_textspotter_lm_r50_1x.py work_dirs/rects_ae_textspotter_lm_r50_1x/latest.pth 8 --json_out results.json

ICDAR 2019 ReCTS Task 4: End-to-End Text Spotting

tools/rects_dist_test.sh local_configs/rects_ae_textspotter_lm_r50_1x_test.py work_dirs/rects_ae_textspotter_lm_r50_1x/latest.pth 8 --json_out results_test.json
python tools/rects_trans2submit.py

Visualization

python tools/rects_test.py local_configs/rects_ae_textspotter_lm_r50_1x.py work_dirs/rects_ae_textspotter_lm_r50_1x/latest.pth --show

Evaluation

The training list, validation list, and evaluation script of this code come from TDA-ReCTS

python tools/rects_eval.py

The output of the evaluation script should be:

[Best F-Measure] p: 84.94, r: 78.10, f: 81.37, 1-ned: 51.02, best_score_th: 0.569
[Best 1-NED]     p: 86.68, r: 76.09, f: 81.04, 1-ned: 51.51, best_score_th: 0.626

Results and Models

TDA-ReCTS

Method Precision (%) Recall (%) F-measure (%) 1-NED (%) Model
AE TextSpotter 84.94 78.10 81.37 51.51 Google Drive
AE TextSpotter (Paper) 84.78 78.28 81.39 51.32 -

ICDAR 2019 ReCTS

Method Precision (%) Recall (%) F-measure (%) 1-NED (%) Model
AE TextSpotter 93.38 89.98 91.65 71.83 Same as TDA-ReCTS
AE TextSpotter (Paper) 92.60 91.01 91.80 71.81 -

License

This project is released under the Apache 2.0 license.

Citation

If you use this work in your research, please cite us.

@inproceedings{wenhai2020ae,
  title={AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting},
  author={Wang, Wenhai and Liu, Xuebo and Ji, Xiaozhong and Xie, Enze and Liang, Ding and Yang, ZhiBo and Lu, Tong and Shen, Chunhua and Luo, Ping},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2020}
}

Other Projects:

PAN (ICCV 2019): https://github.com/whai362/pan_pp.pytorch

PSENet (CVPR 2019): https://github.com/whai362/PSENet