YujiaoShi/HighlyAccurate

Ford retraining instability

Miaowshroom opened this issue · 1 comments

Dear Dr. Shi,

Thanks for sharing the code for the Beyond cross-view image retrieval paper. Big fan of your work.

I am retraining your models and have some questions, wish to have your advice:

  1. Epochs, in the paper you mentioned the models are trained for 2 epochs and in the code, it started from epoch 0. So I assume the training results in table 1 and 2 are from "Epoch 1" as it is the second epoch. However, the default number of epochs for Kitti is set to 5, which will train to "Epoch 4". Just to confirm both for kitti and ford I shall use the result at epoch 1 right?

  2. The training results for Ford vary a lot, sometimes epoch 0 is very high and epoch 1 is very low, then the next epoch accuracy increases again. For Kitti it does not have this problem. I am not sure whether it is related to the random seed settings, looks like it shall not be an issue. For example the result of log 1 Ford:

====================================
EPOCH: 0
lateral within 1 meters pred: 32.285714285714285
longitudinal within 1 meters pred: 6.095238095238095
lateral within 3 meters pred: 64.19047619047619
longitudinal within 3 meters pred: 16.904761904761905
lateral within 5 meters pred: 71.66666666666667
longitudinal within 5 meters pred: 27.142857142857142
within 1 degrees pred: 18.761904761904763
within 3 degrees pred: 49.476190476190474
within 5 degrees pred: 69.52380952380952
EPOCH: 1
lateral within 1 meters pred: 20.61904761904762
longitudinal within 1 meters pred: 5.761904761904762
lateral within 3 meters pred: 54.095238095238095
longitudinal within 3 meters pred: 17.0
lateral within 5 meters pred: 67.38095238095238
longitudinal within 5 meters pred: 27.0
within 1 degrees pred: 21.428571428571427
within 3 degrees pred: 48.04761904761905
within 5 degrees pred: 64.95238095238095
EPOCH: 2
lateral within 1 meters pred: 44.57142857142857
longitudinal within 1 meters pred: 5.095238095238095
lateral within 3 meters pred: 73.42857142857143
longitudinal within 3 meters pred: 18.19047619047619
lateral within 5 meters pred: 76.23809523809524
longitudinal within 5 meters pred: 28.23809523809524
within 1 degrees pred: 53.57142857142857
within 3 degrees pred: 77.66666666666666
within 5 degrees pred: 85.19047619047619

  1. For ford log 2 I am not able to obtain the lateral result in the paper, is the result in the paper the average over a certain number of rounds of training, or is it the best instance of all instances:
  Lateral 1 Lateral 3 Lateral 5 longitudinal 1 longitudinal 3 longitudinal 5 Azimuth 1 Azimuth 3 Azimuth 5
Log 2 init 4.78 15.72 25.38 4.91 15.08 25.17 10.06 30.43 49.96
paper 31.20 66.46 78.27 4.80 15.27 25.76 9.74 30.83 51.62
pretrained 30.35 65.92 77.89 4.53 15.91 26.19 14.97 41.45 63.19
Retrain round 1 13.71 48.70 70.41 5.42 15.40 25.76 9.31 30.19 49.26
round 2 17.17 59.94 83.07 4.67 14.89 25.33 10.12 31.31 51.33
round 3 26.03 61.87 74.81 5.21 16.34 27.15 17.28 47.49 68.82
  1. About the init accuracy, I found some of the trained instances obtained worse accuracy than the init value or marginal for longitudinal and/or Azinumth. Some insight on this?

Best regards,
Wenmiao

Hi Wenmiao,

Thanks for the questions.

  1. Yes, the results reported in the paper are all from Epoch 1, no matter for the KITTI or the Ford dataset.

  2. I also observed the same issue on the Ford dataset recently. I have tried different methods on the Ford dataset and found their performance is consistently unstable. I am not sure whether it is because of the inaccuracy of the GPS labels. My suggestion might be:
    (1) run the comparison algorithms N times on the Ford dataset and compare the average results and standard deviation; or
    (2) do not compare algorithms on this dataset.

  3. (1) For the results reported in the paper, I did not try multiple times and report the average. It is just a one-time result. It is possible that you can not reproduce the results due to the random issue as discussed in 2.
    (2) For the discrepancy between the reproduced results by the released pre-trained model and the reported results in the paper, it should be caused by the slight code difference between the one used when I was preparing the submission and the released code. You may also find that the released model produces better results than the reported ones in the paper on some other test sets. Since there are no significant differences, I did not dig into the problem and make the two versions strictly aligned.

  4. On longitudinal pose optimization, yes, sometimes the optimized results are even inferior to the initial results. My current speculation is that the longitudinal pose ambiguity is too significant and hard to be fixed. There is also no scheme that can determine whether the initial pose is better than the optimized pose or not in our method.

Best,
Yujiao