Relja/netvlad

Incorrect image normalisation with AlexNet model

Nanne opened this issue · 1 comments

Nanne commented

The input images in your codebase are normalised as follows:
im(:,:,1)= im(:,:,1) - net.meta.normalization.averageImage(1,1,1);
im(:,:,2)= im(:,:,2) - net.meta.normalization.averageImage(1,1,2);
im(:,:,3)= im(:,:,3) - net.meta.normalization.averageImage(1,1,3);

and while averageImage is a matrix (of size [224,224,3] for vgg16 and [227,227,3] for AlexNet) only the first pixel's RGB values are used. Luckily, for the models using vgg16 this is not a problem, because all pixels have the same RGB value. However, for the AlexNet models this is not the case:

metadata = load('caffe_pitts30k_conv5_vlad_preL2_intra.mat');
mean(mean(metadata.net.meta.normalization.averageImage, 1), 2)

ans(:,:,1) = 122.6769
ans(:,:,2) = 116.6700
ans(:,:,3) = 104.0102

metadata.net.meta.normalization.averageImage(1,1,:)

ans(:,:,1) = 117.3785
ans(:,:,2) = 117.6438
ans(:,:,3) = 110.1771

If done consistently between train and test its probably not a major issue, but it might affect generalisation performance. I didn't have a chance to test it (nor a proper setup to do it), but it seems worthwhile (if I didn't miss anything) to retrain and reevaluate the AlexNet models with correct normalization.

Relja commented

Yes that's true. As you say, since I train it like that, it shouldn't be an issue. I did also experiment at some point on using different average images (though this was before NetVLAD) and didn't see much difference. All recent architectures (VGG, ResNet..) tend to not have the average image but just do a VGG-like input image normalization.

One major reason for doing it this way is that it enables you to run the network convolutionally, for any image resolution. If you wanted to use the mean image for arbitrary image sizes, then you would need to resize (usually downscale) and crop, loosing valuable detail.