Mask scoring r cnn with TensorFlow 2.0

GOAL

Complete Mask scoring r cnn with pure TensorFlow 2.0

Introduction

R-CNN: Regions with CNN features (2014) (Paper)

Steps:

Generate 2000 areas as candidates
Method: selective search. Use traditional methods to divide the image into several parts.
Calculate feature maps of the 2000 candidates
NN: AlexNet (2012)
Classify feature maps for 2000 candidates with SVM and modify the size of bounding box with bonding box regression

Advantages of R-CNN:

Used CNN
Training the bonding box modification with bonding box regression

Disadvantages of R-CNN:

Selective search is time-consuming
Need to calculate all feature maps for 2000 RoI (Region of interested)
Training three-part separately requires massive storage

Fast R-CNN (2015)(Paper)

Steps:

Generate 2000 areas as candidates with selective search ( same as R-CNN )
Only calculate feature maps for the original image
Pick up the feature of RoI with ‘RoI Pooling Layer.’
Implement the Classification and Bbox prediction with an FC Layer

Advantages of Fast R-CNN:

Instead of calculating 2000 feature maps, Fast R-CNN only calculate one feature map and get the desired feature with ‘RoI Pooling Layer.’
Besides the selective search, all parts are trained end-to-end

Disadvantages of Fast R-CNN:

Still using ‘selective search.’

Faster R-CNN (2016)(Paper)

Steps:

Only calculate feature maps for the original image
Input feature maps into the RPN ( Region Proposal Network ) to generate proposals
Input proposals and feature maps into RoI Pooling Layer and then get bbox and category with FC Layer

Advantages of Faster R-CNN:

Achieved end-to-end training

Mask R-CNN (2017)(Paper)

...

Mask scoring R-CNN (2019)(Paper)

...

Details of NN

RPN ( Region Proposal Network )

RPN is just like the convolution, it slides the window on the image and generates 9 preset anchor boxes ( three areas 128^2, 256^2, 512^2, three ratios 1:1, 1:2, 2:1 ) If the size of the feature map is 4060, then the total number of the anchors is 4060*9 For each anchor (pixel or point), there are two header NN:

conv(1,1,18) Classify it foreground or background, since we have 9 anchor boxes, for each anchor box we need to calculate two scores, one for foreground and one for the background. Therefore, the output is, the value indicates the score and only select one ( with softmax activation function )
- When training:
  - foreground: IoU of anchor box and ground truth > 0.7
  - background: IoU of anchor box and ground truth < 0.3
  - discard the anchors with 0.3 < IoU < 0.7
  - Randomly choose 128 foregrounds and 126 backgrounds
conv(1,1,36) Modify the bonding box, since for each point we have 9 preset anchors and we need (tx, ty, th, tw) to modify the bounding box, we have the output as 9*4=36
- Current position: (x, y), size: (H, W)
- after modified: position: (x+tx, y+ty), size: (H * th, W * tw)
- Loss function: SmoothL1loss
- Here only training the NN for 128 positive anchors

RoI Pooling Layer

Purpose: for each RoI, pick the corresponding features from feature maps to input them to the following FC Layer.
procedures:

Get the features of RoI
Reshape that features to fit the input shape of FC Layer For example: as shown in this figure, For each RoI, RoI Pooling Layer reshapes The size of feature maps to 7*7

Similar to the RPN, RoI Pooling Layer also only trains the positive RoI.

How to train and test data

Run setup.py
Update the path information inside the config file under Configs folder
- update DATA_JSON_FILE to your json file
- update PATH_IMAGES to your image folder
Run TrainAndTest.py

Update 2020/01/18

made RPN(Region Proposal Network) head

RPN with backbone

made RoI(Region of Interest) head

RoI with backbone

TODO

optimize the program
run cocoeval
mask head
score head
convert to tensorflow lite

ShuzhiLiu/MaskRCNN

Mask scoring r cnn with TensorFlow 2.0

GOAL

Introduction

R-CNN: Regions with CNN features (2014) (Paper)

Steps:

Advantages of R-CNN:

Disadvantages of R-CNN:

Fast R-CNN (2015)(Paper)

Steps:

Advantages of Fast R-CNN:

Disadvantages of Fast R-CNN:

Faster R-CNN (2016)(Paper)

Steps:

Advantages of Faster R-CNN:

Mask R-CNN (2017)(Paper)

Mask scoring R-CNN (2019)(Paper)

Details of NN

RPN ( Region Proposal Network )

RoI Pooling Layer

How to train and test data

Update 2020/01/18

made RPN(Region Proposal Network) head

RPN with backbone

made RoI(Region of Interest) head

RoI with backbone

TODO

Resources for learning Mask scoring r cnn