still cannot repeat your experiment, please help!

Question

still cannot repeat your experiment, please help!

bbbrian94 opened this issue 5 years ago · 8 comments

Thank you again for your new provided model.

What make me confuse is:

I have been working on the new baseline model of odom split you've been uploaded. from that point, with exact same train and solver you provided here for eigen_split, the result still not very good such as 26 for rmse, 9 for rot error/100m on Kitti sequence data series 9.

Originally posted by @bbbrian94 in #28 (comment)

Answer 1 · 2019-08-04T14:50:48.000Z

probably my Dataset generator for KITTI Odometry split is wrong? would you like to provide that part?

Answer 2 · 2019-08-05T06:16:28.000Z

Hi @bbbrian94 , sorry for the late reply.
Referring to your previous question about the learning rate, as mentioned in the paper, we adjust the learning rate manually after seeing the loss is converged.
After checking my experiments log for Temporal experiment, I use 1e-3 for finetuning Baseline model for 200k iter and then use 1e-4 for another 100k iter. Sorry for not making it clear enough before.
You may try with this setting and let me know

Answer 3 · 2019-08-05T07:20:04.000Z

Hi @bbbrian94 , I just added my previous odom dataset builder into the current project. I did a simple test and it seems works fine. Please check if it works for you.
Please check this temporary branch.
https://github.com/Huangying-Zhan/Depth-VO-Feat/blob/kitti_odom_builder/data/dataset_builder.py

Answer 4 · 2019-08-05T13:35:56.000Z

Wow! Let me try. Thank you for your help!

Answer 5 · 2019-08-06T12:29:19.000Z

After I take your retrain step the result do become better, but still not very close to yours. the result is like below:

Sequence: 9
Average translational RMSE (%): 17.654958499450835
Average rotational error (deg/100m): 6.814821872909047
Sequence: 10
Average translational RMSE (%): 17.604867216912844
Average rotational error (deg/100m): 5.685002371454657

So the stepsize is still 80k instead of 120k for the Odom split training? I found the comment of the stepsize in the solver is 120k, but you are using 80k for the eigen split training.

Also it makes me feel weird that your newly uploaded code of data_builder you comment the db save for K and T_R2L. should I uncomment them for usage?

Answer 6 · 2019-08-08T00:34:16.000Z

For the data_builder for odometry split, it is my mistake. You should uncomment them for usage.
The reason it was commented is that when I was testing the code before uploading, I don't have Caffe installed in my current PC. I commented these lines and also "import caffe" during testing. I uncomment import caffe afterward but forget to uncomment the other lines.
Thanks for pointing out, I will update it soon.

For the stepsize, from my experiment log, the stepsize is 80k. I think 120k comment was left there because of some old experiments. You can use 80k.

Just to make it clear. From my experiment log, I first train the Baseline model using the provided script.
Then I use the Baseline model as depth model initialization for Temporal experiment.
Firstly, I use 1e-3 learning rate with step size 80k and train for 200k.
Then, using the model trained in the first step as initialization, I use 1e-4 learning rate with step size and train for 100k.
This information is from my experiment log. I hope it records the experiment correctly (since I did a clean up).

Two things I will suggest are,
(1) for the second finetuning, maybe you can try with 1e-4 for 200k. My other experiments are mainly trained for 200k but this one is 100k. It looks a little weird to me but I can't recall the memory. Sorry about that.
(2) You may try to increase batch size for the Temporal experiment. I remember increasing batch size is helpful for training Pose network.

Hope it helps.

Answer 7 · 2019-08-08T01:30:04.000Z

I've tried with the first one already by my self "maybe you can try with 1e-4 for 200k" yesterday. I found that one help not very much. and I just want to make sure that the graph below is the right/similar one as your intermediate result of after finetune the baseline model for 200K.
sequence_09.pdf
sequence_10.pdf

I will try manual reduce upon this with your second advice :"You may try to increase batch size for the Temporal experiment. I remember increasing batch size is helpful for training Pose network." and check out the result. thank you!

Answer 8 · 2019-08-08T05:22:40.000Z

It doesn't look exactly similar to mine. Here are the results I got for finetuning the baseline model for 200K (first finetuning)

The quantitative result for these two are:
seq | trans | rot
09 | 17.03 | 5.85
10 | 12.90 | 4.80

After the second finetuning,

seq | trans | rot
09 | 11.93 | 3.91
10 | 12.55 | 3.45
This result is a little bit different from the paper. I remember it is because of the downsampling function difference. I was using Caffe's image downsampling for testing for the paper.
For the released code, I was using Python's image downsampling.

Hope this can be helpful as a reference.