Relja/netvlad

Unexpected recall values

gmberton opened this issue · 6 comments

Hello, I would like to do only inference on Tokyo247 using AlexNet + Max (off-the-shelf).
After running the following code I get rec@10= 0.0762, but I see in the paper that the rec@10 for AlexNet + Max off-the-shelf should be around 0.32. Am I doing something wrong in building the net? How can I load one of your trained nets from the project page? I don't think it could be useful as the weights of the net are fixed, but I'm using matlab R2020a, matconvnet-1.0-beta25, cuda 10.1

`cd my_path/netvlad;
setup;
paths = localPaths();
opts.netID = 'caffe';
opts.layerName = 'conv5';
opts.method = 'max';
net = loadNet(opts.netID, opts.layerName);
net = addLayers(net, opts, []);

dbTest= dbTokyo247();
qFeatFn = sprintf('%s%s_%s_q.bin', paths.outPrefix, opts.netID, dbTest.name);
dbFeatFn= sprintf('%s%s_%s_db.bin', paths.outPrefix, opts.netID, dbTest.name);
serialAllFeats(net, dbTest.qPath, dbTest.qImageFns, qFeatFn, 'batchSize', 1);
serialAllFeats(net, dbTest.dbPath, dbTest.dbImageFns, dbFeatFn, 'batchSize', 64);
qFeat = fread( fopen(qFeatFn, 'rb'), inf, 'float32=>single');
dbFeat= fread( fopen(dbFeatFn, 'rb'), inf, 'float32=>single');
[recall, ~, ~, opts]= testFromFn(dbTest, dbFeatFn, qFeatFn);
`

Relja commented

Hi,

Hmm, I'm not sure what could be wrong to be honest. I don't have Matlab any more so I can't check these commands. Can you try some other network, e.g. the trained max pooling one, or some of the NetVLAD ones, and see if you get reasonable results? I'm quite sure those should work as quite a few people replicated it and people didn't seem to have problems.

Btw the qFeat and dbFeat lines are unnecessary because they are not used here. If you do want to use them, they should be reshaped appropriately (e.g. see testFromFn).

I tried with caffe_tokyoTM_conv5_max and caffe_tokyoTM_conv5_vlad_preL2_intra, and I'm getting similarly low results. I can't seem to figure out the problem.
By the way, how is the resize for the queries computed?
Did you follow the same approach as "24/7 place recognition by view synthesis" where it's written "we re-size each image to have the maximum dimension of 640 pixels", or did you resize to a fixed shape, or even 480x640 for horizontal images and 640x480 for vertical?

Relja commented

That's strange as I had plenty of people reproducing the numbers.
We do the same thing as the 24/7 paper.

Relja commented

Hi, have you figured this out? I'd like to close the issue if it was a mistake

Sorry, I didn't work on this anymore, and I couldn't solve the problem

Relja commented

Ok, sorry about that. I will close the issue as I haven't had other people complain about it and there's no way for me to test. If it becomes a problem for more people, I can try to investigate somehow (though it's a bit tough as e.g. I don't have MATLAB)