oravus/seqNet

Doubts about the descriptors / img poses from timestamps of img & poses

kaiyi98 opened this issue · 5 comments

Hi! Thanks for your work!
I have some doubts about the nordland image descriptors.

  1. The nordland image descriptors are from netvlad. Whether the netvlad model was trained in the nordland?
  2. Which one is a fixed distance sample between oxford-pnv and oxford-v1.0

Kind regards!

And then,I test oxford-v1.0 dataset using your oxford-v1.0_pretrained_l5_w3 model and get the reacall@1=0.7424 recall@5=0.8775 racall@20=0.9632. But the results in paper Table I are SeqNet(S5) recall@1/5/20=0.62/0.76/0.88.
I don't konw why.
Thank you!

Hi @kaiyi98,

Hi! Thanks for your work! I have some doubts about the nordland image descriptors.

  1. The nordland image descriptors are from netvlad. Whether the netvlad model was trained in the nordland?

The provided single image descriptors are not trained on Nordland but Pittsburgh (30K). Please see this and this to train the model.

  1. Which one is a fixed distance sample between oxford-pnv and oxford-v1.0

oxford-v1.0 has the fixed distance sampling. oxford-pnv refers to the data split as per the PointNetVLAD paper.

Hope that helps.

And then,I test oxford-v1.0 dataset using your oxford-v1.0_pretrained_l5_w3 model and get the reacall@1=0.7424 recall@5=0.8775 racall@20=0.9632. But the results in paper Table I are SeqNet(S5) recall@1/5/20=0.62/0.76/0.88. I don't konw why. Thank you!

Both the results are valid. The difference is in the underlying trained model used for the purpose. In the paper, we mainly reported results by testing with models trained on a different city. So, in Table 1 there, Oxford results correspond to a model trained on Brisbane data and vice versa.

The results you obtained are expected to be better for Oxford as the model is trained on Oxford.

Hope that helps.

And then,I test oxford-v1.0 dataset using your oxford-v1.0_pretrained_l5_w3 model and get the reacall@1=0.7424 recall@5=0.8775 racall@20=0.9632. But the results in paper Table I are SeqNet(S5) recall@1/5/20=0.62/0.76/0.88. I don't konw why. Thank you!

Both the results are valid. The difference is in the underlying trained model used for the purpose. In the paper, we mainly reported results by testing with models trained on a different city. So, in Table 1 there, Oxford results correspond to a model trained on Brisbane data and vice versa.

The results you obtained are expected to be better for Oxford as the model is trained on Oxford.

Hope that helps.

Thank you! Can you tell me how to align the gps timestamps and the stereo.timestamps for Oxford RobotCar.

You can do it as below:

import numpy as np
from scipy.spatial.distance import cdist

def getClosestPoseTsIndsPerImgTs(poseTs,imgTs,memEff=True):
    if memEff:
        matchInds = np.array([np.argmin(abs(poseTs-ts)) for ts in imgTs])
    else:
        diffMat = cdist(poseTs.reshape([-1,1]),imgTs.reshape([-1,1]))
        matchInds = np.argmin(diffMat,axis=0)
    return matchInds

closeInds = getClosestPoseTsIndsPerImgTs(insTS,imgTS,memEff=True)
imgPoses = insNE[closeInds,:]