
AnchorGenerator not generating complete anchors at frequent strides

JohnMBrandt opened this issue · 0 comments

Describe the bug
I am implementing an object detector based on cascade-rcnn_r50_fpn with a CocoDataset dataset. I am working on small, dense object detection, and have adjusted the anchor sizes accordingly, to:

            scales=[1, 1.5, 2, 2.5, 3, 4],
            ratios=[0.5, 1, 2],
            strides=[4, 8, 16, 32, 64],
            base_sizes=[4, 8, 16, 32, 64],),

This network trains and generates good results on a 512 x 512 image. I have noticed, however, that many of my ground truth boxes are missed by the anchor generator, as even a stride of 4 is too big to generate high enough IoU. This was identified through Pyodi:

However, if I increase the strides from [4, 8, 16, 32, 64], to [2, 4, 8, 16, 32], set the RPN proposal to be able to create this number of regions, and adjust the base_sizes and scales accordingly, the behavior is not as expected.

            scales=[2, 4, 8],
            ratios=[0.5, 1, 2],
            strides=[2, 4, 8, 16, 32],
            base_sizes=[2, 4, 8, 16, 32],),

Based on the feature pyramid sizes for a 512 x 512 image as 512, 256, 128, 64, 32, this should generate 256*256 + 128*128 + 64*64 + 32*32 +16*16 = ~87,000 anchors. With the 4, 8, 16, 32, 64 strides, this should generate ~21,000 anchors.

Based on these calculations, I would expect the above modifications to the rpn_proposal to enable predictions to be generated for the entire input image. However, the model does not predict boxes on the whole image.


  1. Adjust the cascade-rcnn_r50_fpn config to have the above modifications to anchor_generator and rpn_proposal, train on any Coco dataset with 512 x 512 images, visualize results.

  2. What dataset did you use? Development on a custom Coco style dataset, also tested with the balloon dataset

Here is an example after 5 epochs with strides=[2, 4, 8, 16, 32], none of the images get boxes in the bottom/right sections

Here is an example after 5 epochs with strides=[4, 8, 16, 32, 64], predictions look as expected
