matterport/Mask_RCNN

Extract ROI-pooled features for detected box

9thDimension opened this issue · 4 comments

I've had good success training Mask RCNN to detect my objects.

Now my goal is to extract the ROI-pooled features from a detection zone so that I can do further analysis and/or training with them. It looks like prior to being classified each ROI on the final feature maps are resized to 7x7x256.

It seems like this line https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/model.py#L925 is responsible for doing the ROI pooling - i.e. extracting the fixed size 7x7x256 features from various sized boxes around the image. So I made the following modification:

def fpn_classifier_graph(rois, feature_maps, image_meta,
                         pool_size, num_classes, train_bn=True,
                         fc_layers_size=1024):
    x = PyramidROIAlign([pool_size, pool_size],
                        name="roi_align_classifier")([rois, image_meta] + feature_maps)
    pooled_roi_features = x # my modification
    
    # do the rest of the stuff in the method...
   
    return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox, pooled_roi_features

So that fpn_classifier_graph() now returns the actual features, as well as the class logits, bboxes and whatever else...

The pooled_rois have shape (batch, 1000, 7, 7, 256). However shortly after the method refine_detections_graph takes in the roi object scores and box coordinates, in order to prune away (overlapping?, background?) results I think. https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/model.py#L685 After going through this method only config.DETECTION_MAX_INSTANCES = 100 detections remain. I can't really follow how they are filtered and possibly re-ordered, however I see that extensive use of NMS and tf.gather is made. Also I'm not totally sure what is meant when the docstring explains that the input rois: [N, (y1, x1, y2, x2)] in normalized coordinates. What exactly is the normalization? To the unit square? To the shrunken image size (128x128 by default?)?

I think I need to apply the same filtering operations to reduce the number of ROI-pooled features down from 1000 to 100 inside the refine_detections_graph() method. Any help with this is much appreciated.

Hi, do you solve this problem? I hope to extract roialigned feature for specific bounding box in the original image, If you find a way to do this, can you share the code? Thanks a lot!

Any word on this? I'm finding the documentation for methods like refine_detections_graph() to be quite sparse. I am still unable to filter the ROI pooled features alongside the rest of the objects.

@9thDimension Hi, can you please send the link of your model.py file as I'm getting an error by doing the changes mentioned by you.