YvanYin/VNL_Monocular_Depth_Prediction

About network training

YiLiM1 opened this issue · 6 comments

Hello, when I use the training method in your article: training the network with NYUD and KITTI, the loss does not converge. Have you trained on nyud or Kitti alone.

Hi, I have trained the model on KITTI and NYU alone, but we didn't face this problem.

wuzht commented

Same here, the network does not converge.

Could you show your loss and learning rate here?

wuzht commented
[Step 73530/86850] [Epoch 25/30]  [kitti]
                loss: 9.829,    time: 1.526856,    eta: 5:38:57
                metric_loss: 2.618,             virtual_normal_loss: 7.343,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000093,       group1_lr: 0.000093,
[Step 73540/86850] [Epoch 25/30]  [kitti]
                loss: 9.734,    time: 1.526918,    eta: 5:38:43
                metric_loss: 2.651,             virtual_normal_loss: 7.094,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000093,       group1_lr: 0.000093,
[Step 73550/86850] [Epoch 25/30]  [kitti]
                loss: 9.716,    time: 1.526981,    eta: 5:38:28
                metric_loss: 2.611,             virtual_normal_loss: 7.137,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000093,       group1_lr: 0.000093,
[Step 73560/86850] [Epoch 25/30]  [kitti]
                loss: 9.999,    time: 1.526977,    eta: 5:38:13
                metric_loss: 2.613,             virtual_normal_loss: 7.250,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73570/86850] [Epoch 25/30]  [kitti]
                loss: 10.003,    time: 1.526970,    eta: 5:37:58
                metric_loss: 2.659,             virtual_normal_loss: 7.324,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73580/86850] [Epoch 25/30]  [kitti]
                loss: 9.877,    time: 1.527021,    eta: 5:37:43
                metric_loss: 2.666,             virtual_normal_loss: 7.326,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73590/86850] [Epoch 25/30]  [kitti]
                loss: 9.916,    time: 1.527081,    eta: 5:37:29
                metric_loss: 2.626,             virtual_normal_loss: 7.350,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73600/86850] [Epoch 25/30]  [kitti]
                loss: 9.988,    time: 1.527141,    eta: 5:37:14
                metric_loss: 2.641,             virtual_normal_loss: 7.360,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73610/86850] [Epoch 25/30]  [kitti]
                loss: 10.206,    time: 1.527199,    eta: 5:37:00
                metric_loss: 2.674,             virtual_normal_loss: 7.393,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73620/86850] [Epoch 25/30]  [kitti]
                loss: 9.851,    time: 1.527234,    eta: 5:36:45
                metric_loss: 2.592,             virtual_normal_loss: 7.259,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73630/86850] [Epoch 25/30]  [kitti]
                loss: 9.606,    time: 1.527297,    eta: 5:36:30
                metric_loss: 2.572,             virtual_normal_loss: 7.096,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73640/86850] [Epoch 25/30]  [kitti]
                loss: 9.606,    time: 1.527356,    eta: 5:36:16
                metric_loss: 2.516,             virtual_normal_loss: 7.096,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73650/86850] [Epoch 25/30]  [kitti]
                loss: 9.705,    time: 1.527416,    eta: 5:36:01
                metric_loss: 2.519,             virtual_normal_loss: 7.210,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73660/86850] [Epoch 25/30]  [kitti]
                loss: 9.985,    time: 1.527482,    eta: 5:35:47
                metric_loss: 2.622,             virtual_normal_loss: 7.357,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73670/86850] [Epoch 25/30]  [kitti]
                loss: 9.811,    time: 1.527546,    eta: 5:35:33
                metric_loss: 2.641,             virtual_normal_loss: 7.216,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73680/86850] [Epoch 25/30]  [kitti]
                loss: 9.615,    time: 1.527540,    eta: 5:35:17
                metric_loss: 2.521,             virtual_normal_loss: 7.118,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73690/86850] [Epoch 25/30]  [kitti]
                loss: 9.613,    time: 1.527537,    eta: 5:35:02
                metric_loss: 2.503,             virtual_normal_loss: 7.071,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73700/86850] [Epoch 25/30]  [kitti]
                loss: 9.863,    time: 1.527586,    eta: 5:34:47
                metric_loss: 2.548,             virtual_normal_loss: 7.352,             abs_rel: 0.823165,       silog: 0.586482,
                group0_lr: 0.000092,       group1_lr: 0.000092,
[Step 73710/86850] [Epoch 25/30]  [kitti]
                loss: 9.806,    time: 1.527648,    eta: 5:34:33
                metric_loss: 2.616,             virtual_normal_loss: 7.270,             abs_rel: 0.823165,       silog: 0.586482, 
                group0_lr: 0.000092,       group1_lr: 0.000092, 

The validation error does not decrease during training. How can I fix this? Thanks.

wuzht commented

Note that I did not alter any training settings

wuzht commented

Problem solved. I have to generate the dense depth maps from the sparse ones before training.