bethgelab/siamese-mask-rcnn

Understanding the evaluation output

howardyclo opened this issue · 1 comments

Hi, thanks for the interesting work!

I am trying to reproduce the mAP50 reported in the paper, but I have a problem of understanding the evaluation output result.
I downloaded the pretrained model large_siamese_mrcnn_coco_full_0320.h5 and ran the experiments/evaluate-experiments.ipynb.
After the first run of evaluation on all classes, the output result is:

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.221
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.357
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.238
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.118
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.216
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.310
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.201
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.377
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.395
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.217
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.416
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.570

How to intepret this result? And how is this related to the results reported in the paper? (Siamese Mask-RCNN: 35.7% mAP50 (detection) and 33.4% (instance segmentation))

The second line gives you the detection result from the paper:

Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.357

If you chose eval_type=["bbox", "segm"] in the siamese_utils.evaluate_dataset function you should get a second such result box preceded by Evaluate annotation type *segm* whose second line gives you the instance segmentation result:

Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.334

(These results might fluctuate a bit due to the evaluation process. We recommend evaluating 5 times to get a good estimate and will update this in the paper.)

Please be aware, that we chose to use mAP50 = AP@[0.5] instead of the "coco metric" mAP = AP@[0.5:0.95]. mAP50 is evaluated at a single Intersection over Union (IoU) threshold of 50% between predicted and the ground truth bounding boxes (corresponding to around 70% overlap between two same-sized boxes/masks) while mAP is evaluated at IoU thresholds of [50%, 55%, ..., 95%] adding weight to exact bounding box/segmentation mask predictions. An in depth discussion of object detection metrics and especially mAP can be found at: https://github.com/rafaelpadilla/Object-Detection-Metrics

We think, that mAP50 is the value most reflective of the result we are interested in: whether our model can find novel objects based on a single reference image. For instance segmentation the additional information about mask quality implicitly included in mAP might make sense. However we found, that correctly masking the sought objects was less of a problem for our model than correctly classifying them.

I hope I was able to answer your question. Feel free to ask if more things are unclear about the results!