hustvl/QueryInst

Learned Proposal Boxes?

JialianW opened this issue · 3 comments

I took a look at the self.init_proposal_bboxes.weight from your trained model, but I found the boxes coordinates were not learned and kept around the initial values of 0.5 0.5 1 1. Is there any problem for this? Thanks

Hi @JialianW, thanks for your valuable issue!

We add no constraint to the proposal boxes at the first stage during training, so the proposal boxes are naturally learnt to be (~0.5, ~0.5, ~1, ~1).

You can also check the pre-trained weight provided by the official Sparse R-CNN repo, the learnt proposal boxes at the first stage should be (~0.5, ~0.5, ~1, ~1) (Therefore the Fig.5 in Sparse R-CNN paper is incorrect...). That means the proposal boxes at the first stage always trend to RoIAlignnearly all the P5 feature map for every instances, which is somewhat intuitive.

We also find that detach the proposal boxes at the first stage gives similar object detection results.

Thank you for the useful answers! I have another question regarding your Youtube-VIS experiments: did you change the roi align mapping threshold for FPN? The default mapping threshold is 56 defined in "class SingleRoIExtractor". Since you use relatively small images for Youtube-VIS, I was wondering if you changed this to make it extract features from higher resolution from FPN?

Thank you for the useful answers! I have another question regarding your Youtube-VIS experiments: did you change the roi align mapping threshold for FPN? The default mapping threshold is 56 defined in "class SingleRoIExtractor". Since you use relatively small images for Youtube-VIS, I was wondering if you changed this to make it extract features from higher resolution from FPN?

We haven't changed that :(
Your suggestion seems quite reasonable and we will try it in the future, thanks!