jmfacil/single-view-place-recognition

About training the model

Opened this issue · 10 comments

@jmfacil Hi, your work is very amazing! However, the code that trains the model is not open source. Could you release the code that trains the model? Moreover, it is noticed that you trained the new layer with 834746 image triplets for 5 epochs. How do you construct these 834746 image triplets? Thank you very much!

Hi @jsdd25,
For training you can find the solver and network defined in the repository in models/not_fine_tuned or models/fine_tuned for the fine tuned version.

The triplets are created according to the specifications of the paper (We chose positive frames in a maximum distance of 3 frames from the anchor, this gives you 5 pairs between every two seasons, you can also consider the mirror cases and it gives you more or less that number of triplets. Removing corner cases ( like beginning and end of a tunnel or a training segment) and same frame in the same season (has no gradient during training) gives you that number. ~24.5K training examples x 5(4 in the same season) possible positive values x 4 seasons x 2 (mirroring)). We took care of not considering negative values too close from the anchor.

Hope this helps ^^. Best!

Thank you very much! However, I have some doubts about your sentence: "We took care of not considering negative values too close from the anchor." Does the ‘negative values too close from the anchor’ means that the negative input is very close to the anchor? If so, I think we should optimize this distance relationship to make the distance between the negative input and the anchor farther. Why did you take care of not considering negative values too close from the anchor? Thank you!

Sorry for not being more specific. You have many more negative pairs than positives as you can imagine. We found that the network converge faster if we don't select negative pairs that are to close (e.g. three frames distance). We did not study this effect in deep but my guess is that it mainly occurs in early stages of the training. As you said it would be interesting to study.

Thank you very much! I have one more question. Suppose a triplet contains an anchor image, a positive image and a negative image. Removing corner cases and same frame in the same season. For each anchor image from ~24.5K training examples, we can find 19(4 in the same season and 5 in the other seasons) positive images. So, we can construct 19 triplets for each anchor image. Is my understanding correct? Then, we randomly select an image (not too close from the anchor image) from the remaining images as a negative image?

Yes. You can also include another 19 images by mirroring them and if you do RGB random data augmentation you would not need to exclude one frame in the same season so it would be 20 (as it would not generate exactly the same descriptor).

Thank you very much for your help! However, I tried your method on the Alderley dataset. I set the batch_size to 5 and randomly select an image (not too close from the anchor image) from the remaining images as a negative image. The loss function is the Wohlhart-Lepetit Loss. And I didn't do RGB random data augmentation. During the training, the Loss is often 0. After 5 epochs, the final result is not ideal. The day image is the query set and the night image is the reference set. The accuracy is only about 1%. After 15 epochs, the accuracy is not improved. Do you know what is the problem in this situation? Thank you a lot!

您好,请问Alderley Day/Night Dataset数据集能分享一下下载链接么,我找了好久所有网站都指向这个链接(https://wiki.qut.edu.au/pages/viewpage.action?pageId=181178395 )但是qut下的所有数据集链接都打不开。请问您可以分享一下么,百度网盘或者之类的都行,十分感谢。

@kaitaotang Hi, I have uploaded the dataset. You can download from https://pan.baidu.com/s/1jCQJgBVabfAsgwKVsr05tA. The passward is klj9. If you have further problem, you can contact me via e-mail: 773903267@qq.com

@jsdd25 The share link about Alderley Day/Night Dataset has expired. Would you please share this dataset with me?