Text Detector for OCR

This text detector acts as text localization and uses the structure of RetinaNet and applies the techniques used in textboxes++.

Train

SynthText

[raw data & tfrecord](https://drive.google.com/drive/folders/1Nj07w3DEL95R3qaIJl8qv6Z9pRb2H405?usp=sharing) ``` cd text_detector/sample/SynthText python3 train.py --train_dataset="/path/to/tfrecord/" ```

balloon from Mask_RCNN

[raw data & tfrecord](https://drive.google.com/drive/folders/1lUrDCWLtj2oL78SRIgwgwtIl1iA6CuHT?usp=sharing) ``` cd text_detector/sample/balloon python3 train.py --train_dataset="/path/to/tfrecord/" ```
  • SSD structure is used, and vertical offset is added to make bbox proposal.
  • The structure is the same as TextBoxes, but the offset for the QuadBox has been added.
  • 4d-anchor box(xywh) offset -> (4+8)-d anchor box(xywh + x0y0x1y1x2y2x3y3) offset
  • last conv : 3x5 -> To have a receptive field optimized for the quad box
  • Simple one-stage object detection and good performance
  • FPN (Feature Pyramid Network) allows various levels of features to be used.
  • output : 1-d score + 4-d anchor box offset
  • cls loss = focal loss, loc loss = smooth L1 loss

Encode

  1. Define anchor boxes for each grid.
  2. Obtain the IoU between the GT box and the anchor box.
  3. Each anchor box is assigned to the largest GT box with IoU.
  4. At this time, IoU> 0.5: Text (label = 1) / 0.4 <IoU <0.5: Ignore (label = -1) / IoU <0.4: non-text (label = 0).

Todo list:

  • Training
    • Training Code
    • Model Save
    • Step Decay Learning Rate
    • Multiple GPU
  • Make Data
    • Make SynthText tfrecord
    • Make ICDAR13 tfrecord
    • Make ICDAR15 tfrecord
    • Make toy dataset(balloon) from Mask_RCNN
  • Network
    • ResNet50,ResNet101
    • Feature Pyramid Network
    • Task Specific Network
    • Trainable BatchNorm (?
    • Freeze BatchNorm (?
    • GroupNorm
    • (binary) focal loss
    • Slim Backbone pretrained weight
  • Utils
    • Add vertical offset
    • Validation infernece image visualization using Tensorboard
    • Add augmentation
    • Add evaluation code (mAP) ==> Unstable
    • QUAD version NMS (numpy version)
    • Combine two NMS method as paper describe
    • Visualization

Environment

  • os : Ubuntu 16.04.4 LTS
  • GPU : Nvidia GTX 1080ti (12GB)
  • Python : 3.6.6
  • Tensorflow : 1.4.0
  • Polygon