Object detection in an urban environment
Video Output
Models Trained (With Config Files)
EfficientDet
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.074
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.183
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.053
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.028
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.279
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.258
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.020
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.086
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.127
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.072
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.409
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.609
Faster-RCNN ResNet50
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.121
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.282
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.092
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.049
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.316
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.442
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.029
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.126
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.188
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.108
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.431
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.534
SSD-MobileNet-V1-FPN
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.195
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.376
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.174
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.090
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.493
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.611
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.039
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.182
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.276
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.173
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.602
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.703
SSD-MobileNet-V2-FPNLite
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.093
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.188
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.084
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.035
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.356
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.470
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.023
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.100
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.156
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.091
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.471
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.541
Discussion
Overall, SSD-MobileNet-V1-FPN
performed the best with an mAP@[IoU=0.50:0.95] = 0.195
.
As far as small objects are considered. again SSD-MobilNet-V1-FPN
performs the best with mAP@[IoU=0.50:0.95] = 0.090
.
For large objects, again SSD-MobilNet-V1-FPN
performs the best with mAP@[IoU=0.50:0.95] = 0.611
.
All the models were trained for 2000 steps using a momentum optimizer. Faster-RCNN ResNet50
was the hardest to train so a larger instance had to be employed, the details of which can be found in the relevant notebook. The batch size was 8 for all models except for Faster-RCNN ResNet50
which had a batch size of 4 to be able to be trained in the available compute within reasonable time.
In my opinion, different models would require different solver options, batch sizes, kinds of solvers, to perform the best. However, in the current instance, with the solver parameters kept fixed (except for a different batch size for FasterRCNN-ResNet50
), SSD-MobileNet-V1-FPN
, has the best object detection metrics.
Update
I trained FasterRCNN-ResNet50 with a larger batch size 0f 8, and I get the following performance:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.169
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.348
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.147
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.071
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.434
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.650
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.038
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.166
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.230
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.130
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.528
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.709
The final conclusion would be: keeping all things constant, the overall performance of SSD-MobileNet-V1-FPN
is still better, but the FasterRCNN-ResNet50
does exhibit better mAP for large objects when compared against all other models (when trained using a batch size of 8 instead of 4 in the previous experiments).