gmberton/CosPlace

Can the evaluation of this model be used in other city images?

wpumain opened this issue · 4 comments

SF-XL is a dataset on the city of San Francisco.When the model was trained in the city image of San Francisco,Can the evaluation of this model be used in other city images?

The image features of each city are different. If this model is trained on the image of city A, then this model can extract the features of city A very well, but it can not extract the features of city B very well, so     I think the image in training and evaluation of this model should be in the same area.But in the article  you used the city image of San Francisco to train the model, but used the city image of Tokyo to test     the model, how to understand this? 

Yes, the models trained with CosPlace can be used for evaluation on other cities, and in almost all cases they reach state-of-the-art results.
The reasons can be mostly two:

  1. using the CosPlace loss (and its underlying CosFace) allows to extract better features than the triplet loss (and similar losses), and those features are robust to domain shifts (different cities, day/night);
  2. training on lots of data helps (obviously), but the CosPlace loss is needed to harness the power of all that data, while the triplet does not scale to large datasets (see the models trained on SF-XL with previous methods).

From the paper we also show that CosPlace does not work well when trained on few images (thousands).
Probably training CosPlace on a large scale dataset from Pittsburgh, and evaluating on Pittsburgh would give better results than using CosPlace trained on San Francisco.

Finally, it is not that surprising that a model trained on a city performs well on another city, as many previous methods trained on Pitts30k and tested on Tokyo 24/7 with good results (e.g. SFRS).

The idea of your saying 【but the CosPlace loss is needed to harness the power of all that data】 is to train all groups, not just a few groups in the program?

Even training on "a few groups" allows us to use millions of images. The 8 groups that we use contain 5M images.
Also, I say that CosPlace loss is needed to harness the power of all that data meaning that previous losses do not benefit from large datasets (see Table 3 of the paper).
It is true however that even CosPlace does not use (or does not benefit from using) all the 41M images of SF-XL, and as we say in the supplementary, "train-time scalability is a factor that can still be vastly improved in future works".

Think you for your help !