Text Detector for OCR

This text detector acts as text localization and uses the structure of RetinaNet and applies the techniques used in textboxes++.

Train

SynthText

[raw data & tfrecord](https://drive.google.com/drive/folders/1Nj07w3DEL95R3qaIJl8qv6Z9pRb2H405?usp=sharing) ``` cd text_detector/sample/SynthText python3 train.py --train_dataset="/path/to/tfrecord/" ```

balloon from Mask_RCNN

[raw data & tfrecord](https://drive.google.com/drive/folders/1lUrDCWLtj2oL78SRIgwgwtIl1iA6CuHT?usp=sharing) ``` cd text_detector/sample/balloon python3 train.py --train_dataset="/path/to/tfrecord/" ```

SSD structure is used, and vertical offset is added to make bbox proposal.
The structure is the same as TextBoxes, but the offset for the QuadBox has been added.
4d-anchor box(xywh) offset -> (4+8)-d anchor box(xywh + x0y0x1y1x2y2x3y3) offset
last conv : 3x5 -> To have a receptive field optimized for the quad box

Define anchor boxes for each grid.
Obtain the IoU between the GT box and the anchor box.
Each anchor box is assigned to the largest GT box with IoU.
At this time, IoU> 0.5: Text (label = 1) / 0.4 <IoU <0.5: Ignore (label = -1) / IoU <0.4: non-text (label = 0).

Training
- Training Code
- Model Save
- Step Decay Learning Rate
- Multiple GPU
Make Data
- Make SynthText tfrecord
- Make ICDAR13 tfrecord
- Make ICDAR15 tfrecord
- Make toy dataset(balloon) from Mask_RCNN
Network
- ResNet50,ResNet101
- Feature Pyramid Network
- Task Specific Network
- Trainable BatchNorm (?
- Freeze BatchNorm (?
- GroupNorm
- (binary) focal loss
- Slim Backbone pretrained weight
Utils
- Add vertical offset
- Validation infernece image visualization using Tensorboard
- Add augmentation
- Add evaluation code (mAP) ==> Unstable
- QUAD version NMS (numpy version)
- Combine two NMS method as paper describe
- Visualization