Clarification of dataset splits

Question

Clarification of dataset splits

Closed this issue a month ago · 3 comments

Thanks to authors for sharing their code.

Can the dataset splits be clarified? That is, can you provide the dataset(s) used for training and validation, as well as the specific split if training / validation are coming from the same dataset?

It is unclear what specifically comprises the validation dataset from the paper (e.g., whether it is a split of MP16, etc). This detail is important to ensure recreation of experiments and fair comparisons of future work.

Thanks!

Answer 1 · 2024-08-29T21:31:43.000Z

Hi Angel, thanks for your question. During training, we used the full MP-16 dataset. As mentioned in our paper, the results of the ablation experiments were obtained by evaluating the model on the Im2GPS3k dataset. Please, let us know if you have any other questions.

Answer 2 · 2024-08-29T22:05:18.000Z

Hi Vicente, thanks for the reply but can you answer the question above before we close the issue?

I did read the paper thoroughly but the question above is about the validation set. It isn't clear from the paper what it contains (I'm assuming just MP-16) or how it is is composed (e.g., the method used to split a validation set from all samples MP-16).

Can you clarify how the validation set is generated such that one can retrain your model? Thanks

Answer 3 · 2024-08-30T00:05:11.000Z

Hi Angel. I apologize for the misunderstanding. Since you mentioned the experiments in the paper, I thought you were referring to that specific set. To answer your question, for our preliminary experiments, the validation set was composed of a random subset of MP-16, containing 1% (≈47,000 GPS-Image pairs) of the full dataset.