Query regarding stage 3 training.

Question

Query regarding stage 3 training.

danishnazir opened this issue 3 years ago · 2 comments

Thank you for the great work.
I am currently running stage 3 training to further refine the depth maps. However, the RMSE is not so great (till 15 epochs), It's still lingering around 860-890. After how many epochs do you generally experience a drop of RMSE especially in the third stage? and the hyperparameters used in the code repository for stage three is the same as you used in the stage 3 experiment?

Answer 1 · 2021-09-07T00:18:46.000Z

Thanks for your interest! The hyperparameters are the same.

The stage 3 training is time consuming (about 100 epochs for stage 1~3, proposed issues) if following the original setting in the repository or paper. So we strongly suggest that if more computational resources are available, the hyper-parameters (resolution, batch size, learning rate scheduling) should be adjusted to accelerate training.

The RMSE get lower usually when the learning rate step down. But it is not normal that the RMSE is 860-890 after 15 epochs in stage 3 (about 40~50 epochs in total). So have you loaded the model trained in stage 1 and 2?

Answer 2 · 2021-09-07T08:56:05.000Z

Thanks for your prompt reply!
I have a lot of resources available and I am running multiple hyperparameters settings. If I use the original hyperparameters settings which includes cropping at low resolution, the rmse is not really great and i am at 830 now at 48th epoch (before stage 3 i was at 773), but my training loss is also 950 rmse, so i assume when training loss will go down the validation loss will also go down.
I have also changed the original hyperparameters settings in my other experiment where I am using full-resolution images along with batch size 16 and a bit high learning rate, in that experiment I am getting results till 760 rmse after the 40th epoch. So I guess this setting works for me.