/RFB-Text-Detection

A Dense Text Detection model using Receptive Field Blocks

Primary LanguageC++MIT LicenseMIT

Receptive Field Blocks Text Detection Module (RFBTD Module)

A Dense Text Detection model using Receptive Field Blocks

Introduction

This repo contains text detection based on Receptive Field Blocks. The text detection model provides a dense receptive field, for predicting text boxes in dense natural scene images like documents, articles etc.

The model is also inspired from EAST: An Efficient and Accurate Scene Text Detector, where the RRBOX part and loss function is taken from.

The features of the model are summarized below:

  • Keras implementation for lucid and clean code.

  • Backbone: Resnet50

  • Inference time for 720p images:

    GPU VERSION

    • Graphics Card: MX130
    • Inference Time: 700ms
    • Batch Size: 1

    CPU VERSION

    • CPU: Intel i7-8550U CPU @ 1.80GHz
    • Inference Time: 1750ms
    • Batch Size: 1
  • The pre-trained model provided achieves 47.09(Single Crop, Resize Only) F1-score on ICDAR 2015, but was not trained on ICDAR 2015. To improve accuracy fine-tune on ICDAR 2015, and predict with multiple crops.

  • The model is tuned for predicting text boxes for natural scene documents, like bank statements, forms, recipts, etc, and evidently do OCR on these text boxes.

Please cite these paper, paper if you find this useful.

Contents

  1. Live Demo
  2. Installation
  3. Download Pre-Trained Model
  4. Demo
  5. Eval
  6. Train
  7. Examples

LiveDemo

The model is hosted on my server so if you want to try it out live, click here and the model is hosted with the resolution resized to @480p. This is due to the model not fiting on my single core, 512mb RAM server. So accuracy will flounder due to the small image input.

Installation

  • Requirements from requirements.txt

    tensorflow==1.13.1

    Keras==2.2.4

    numpy==1.16.2

    opencv_contrib_python==4.0.0.21

    plumbum==1.6.7

Download

The Pre-trained Model is available at this Link GoogleDrive

Demo

If you've downloaded the pre-trained model, run

python3 run_demo.py

the images are taken from test_images/input_images and the output is predicted to test_images/predicted_images

Eval

If you want to benchmark it on ICDAR 2015, run

python eval.py

the images are taken from eval_images/evaluation_images and the output is predicted to eval_images/predicted_boxes, text files will be then written to the output path. This format confides to the ICDAR text detection challenge format.

Train

The training code is not uploaded as of now, but will definitely post it in the upcoming days.

Examples

Here are some examples, the images on the left represent source images, and the images on the right are images overlayed with the predicted bounding boxes.

image_1 image_2 image_3 image_4

Issues

If you encounter any issues, please create an issue tracker.

Model Architecture

If you want to get more insight on the RF Block, just peruse the model architecture. image_1

And a huge shout out for argman/EAST for providing C++ NMS.