Mask R-CNN for Text Detection

Introduction

  • A text detector based on Mask R-CNN is used, and the methods are mainly inspired by fully convolutional networks. First, CNN is adopted to detect text blocks, from which character candidates are extracted. Then FPN is used to predict the corresponding segmentation masks. Last, segmentation mask is used to find suitable rectangular bounding boxes for the text instances.

  • The pre-trained model provided on ICDAR 2017 Incidental Scene Text Detection Challenge using only training images from ICDAR 2017 and 2019.

  • This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.

Instance Segmentation Sample

Contents

  1. Installation
  2. Download
  3. Demo
  4. Test
  5. Train
  6. Examples
  7. Result

Installation

  • Python 3.6
  • Tensorflow v1.8.0+
  • Keras
  • opencv-python 3.4+

Download

Models trained on ICDAR 2017 (training set) + ICDAR 2019 (training set): Download link

Test

If you've downloaded the pre-trained model, you can run

python test.py 

a text file will be then written to the output path.

Train

Result

Using only ICDAR 2017 MLT training set and ICDAR 2019 training set. Mask R-CNN for ICDAR MLT 2017 Challenge 1 Text detection.
Method Precision (%) Recall (%) F-measure (%)
Mask R-CNN-resnet101 83.52 76.58 79.89