Mask R-CNN for Text Detection

A text detector based on Mask R-CNN is used, and the methods are mainly inspired by fully convolutional networks. First, CNN is adopted to detect text blocks, from which character candidates are extracted. Then FPN is used to predict the corresponding segmentation masks. Last, segmentation mask is used to ﬁnd suitable rectangular bounding boxes for the text instances.
The pre-trained model provided on ICDAR 2017 Incidental Scene Text Detection Challenge using only training images from ICDAR 2017 and 2019.
This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.

Models trained on ICDAR 2017 (training set) + ICDAR 2019 (training set): Download link

If you've downloaded the pre-trained model, you can run

python test.py

a text file will be then written to the output path.

Method	Precision (%)	Recall (%)	F-measure (%)
Mask R-CNN-resnet101	83.52	76.58	79.89

cuppersd/MASKRCNN-TEXT-DETECTION