feymanpriv/DELG

ROxf / RPar query image cropping

Opened this issue · 5 comments

Hello @feymanpriv,

Thanks for providing this pytorch implementation of DELG! This can be very helpful to the community.

I was looking at the retrieval evaluation code, and I have a question about the experimental protocol on the Revisited Oxford and Paris datasets:

I don't see where query images are cropped before feature extraction. I believe feature extraction is performed here, if I am not mistaken, and this does not seem to distinguish between query and index images (query ones should be cropped, while index ones should not). Possibly the query images would be pre-cropped before image loading? (this is uncommon, and not done in the dataset preparation code in this repo as far as I can tell)

The need for query image cropping is described in the Revisited Oxford/Paris paper. See section 2.3,

Only the cropped regions are to be used as queries; never the full image, since the ground-truth labeling strictly considers only the visual content inside the query region.

Also the authors provide example code where it's shown query cropping, see here. As another example, here's how we do it in the DELG TF codebase (cropping before extraction if images are queries).

Naturally, if images are not cropped before feature extraction, the performance should be much higher given the much larger context.

I am also wondering whether this was the same protocol used in your DOLG paper. There, I see huge gains due to simple reimplementation of DELG (eg, +14-18pp improvement for results in the R1M large-scale dataset) -- those look a bit suspicious.

Again, thanks a lot for your work here. My goal is not to remove merit from your work at all (I find DOLG very interesting!) -- I just really want to clarify the protocol and make sure we would be comparing results apples-to-apples.

Best,
Andre

@andrefaraujo
Thank you for your remind. I have been already aware of this problem and i have tested the model by cropping the query of DOLG. Results differ just a little when testing on Roxf and Rpar without 1M and can achieve the performance in the DOLG paper.

As for evaluation with 1M, i also feel strange when my partner reported the mAP value on 1M dataset during the rebuttal process(DOLG did not report the value on 1M at the start). I have emailed one of the author of DELG but got no reply. I thought the performance would change consistent with the results on Roxf and Rpar without 1M so we put it directly on the paper.

Now It appears to be unfair here to compare with DELG on 1M dataset after your remind. We are going to make the re-verification soon and we will update the results in the paper later.

Thanks again!

" i have tested the model by cropping the query of DOLG. Results differ just a little when testing on Roxf and Rpar without 1M and can achieve the performance in the DOLG paper. "

When i use multi-scale testing, the result is even better.

Thanks a lot for your quick response, @feymanpriv!

Looking forward to having all the results with the query being cropped. Please feel free to reply to this github issue once you have the numbers :)

Hi @feymanpriv , just a friendly ping here. Were you able to obtain the results with the fixed cropping?

Thanks again!

@andrefaraujo and @feymanpriv
I uploaded the reproduced ROxford/RParis results with R101-DOLG model including the query cropping process in here.
I hope this helps.