/MASTER-mmocr

Re-implementation of MASTER by mmocr

Primary LanguagePythonApache License 2.0Apache-2.0

MASTER-mmocr

Contents

  1. About The Project
  2. Getting Started
  3. Usage
  4. Result
  5. Coming Soon
  6. License
  7. Citations
  8. Acknowledgements

About The Project

This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR,which is an open-source toolbox based on PyTorch. The overall architecture will be shown below.

MASTER's architecture

Dependency

Getting Started

Prerequisites

Installation

  1. Install mmdetection. click here for details.

    # We embed mmdetection-2.11.0 source code into this project.
    # You can cd and install it (recommend).
    cd ./mmdetection-2.11.0
    pip install -v -e .
  2. Install mmocr. click here for details.

    # install mmocr
    cd ./MASTER_mmocr
    pip install -v -e .
  3. Install mmcv-full-1.3.4. click here for details.

    pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
    
    # install mmcv-full-1.3.4 with torch version 1.8.0 cuda_version 10.2
    pip install mmcv-full==1.3.4 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html

Usage

The usage of this project, is consistent with MMOCR-0.2.0. You can click here for mmocr usage details.

For training, run command

CUDA_VISIBLE_DEVICES={device_id} PORT={port_number} ./tools/dist_train.sh {config_path} {work_dir} {gpu_number}

# example
CUDA_VISIBLE_DEVICES=0 PORT=29500 ./tools/dist_train.sh ./configs/textrecog/master/master_ResnetExtra_academic_dataset_dynamic_mmfp16.py /expr/mmocr_text_line_recognition/ 1

PS :

  • As mentioned in Prerequisites part, we use synthetic image datasets for training and real image datasets for evalutating. The 7 real image datasets mentioned above will be evaluated at each evaluation interval.

Result

Dataset Paper reported accuracy Our accuracy
IIIT5K 95.0 95.07
SVT 90.6 90.42
IC03 96.4 95.58
IC13 95.3 96.03
IC15 79.4 80.95
SVTP 84.5 84.34
CUTE80 87.5 90.62

Coming Soon

  • 1st Solution for ICDAR 2021 Competition on Scientific Table Image Recognition to Latex.

License

This project is licensed under the MIT License. See LICENSE for more details.

Citations

If you find MASTER useful please cite paper:

@article{Lu2021MASTER,
  title={{MASTER}: Multi-Aspect Non-local Network for Scene Text Recognition},
  author={Ning Lu and Wenwen Yu and Xianbiao Qi and Yihao Chen and Ping Gong and Rong Xiao and Xiang Bai},
  journal={Pattern Recognition},
  year={2021}
}

Acknowledgements