About network training
YiLiM1 opened this issue · 6 comments
YiLiM1 commented
Hello, when I use the training method in your article: training the network with NYUD and KITTI, the loss does not converge. Have you trained on nyud or Kitti alone.
YvanYin commented
Hi, I have trained the model on KITTI and NYU alone, but we didn't face this problem.
wuzht commented
Same here, the network does not converge.
YvanYin commented
Could you show your loss and learning rate here?
wuzht commented
[Step 73530/86850] [Epoch 25/30] [kitti]
loss: 9.829, time: 1.526856, eta: 5:38:57
metric_loss: 2.618, virtual_normal_loss: 7.343, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000093, group1_lr: 0.000093,
[Step 73540/86850] [Epoch 25/30] [kitti]
loss: 9.734, time: 1.526918, eta: 5:38:43
metric_loss: 2.651, virtual_normal_loss: 7.094, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000093, group1_lr: 0.000093,
[Step 73550/86850] [Epoch 25/30] [kitti]
loss: 9.716, time: 1.526981, eta: 5:38:28
metric_loss: 2.611, virtual_normal_loss: 7.137, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000093, group1_lr: 0.000093,
[Step 73560/86850] [Epoch 25/30] [kitti]
loss: 9.999, time: 1.526977, eta: 5:38:13
metric_loss: 2.613, virtual_normal_loss: 7.250, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73570/86850] [Epoch 25/30] [kitti]
loss: 10.003, time: 1.526970, eta: 5:37:58
metric_loss: 2.659, virtual_normal_loss: 7.324, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73580/86850] [Epoch 25/30] [kitti]
loss: 9.877, time: 1.527021, eta: 5:37:43
metric_loss: 2.666, virtual_normal_loss: 7.326, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73590/86850] [Epoch 25/30] [kitti]
loss: 9.916, time: 1.527081, eta: 5:37:29
metric_loss: 2.626, virtual_normal_loss: 7.350, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73600/86850] [Epoch 25/30] [kitti]
loss: 9.988, time: 1.527141, eta: 5:37:14
metric_loss: 2.641, virtual_normal_loss: 7.360, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73610/86850] [Epoch 25/30] [kitti]
loss: 10.206, time: 1.527199, eta: 5:37:00
metric_loss: 2.674, virtual_normal_loss: 7.393, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73620/86850] [Epoch 25/30] [kitti]
loss: 9.851, time: 1.527234, eta: 5:36:45
metric_loss: 2.592, virtual_normal_loss: 7.259, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73630/86850] [Epoch 25/30] [kitti]
loss: 9.606, time: 1.527297, eta: 5:36:30
metric_loss: 2.572, virtual_normal_loss: 7.096, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73640/86850] [Epoch 25/30] [kitti]
loss: 9.606, time: 1.527356, eta: 5:36:16
metric_loss: 2.516, virtual_normal_loss: 7.096, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73650/86850] [Epoch 25/30] [kitti]
loss: 9.705, time: 1.527416, eta: 5:36:01
metric_loss: 2.519, virtual_normal_loss: 7.210, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73660/86850] [Epoch 25/30] [kitti]
loss: 9.985, time: 1.527482, eta: 5:35:47
metric_loss: 2.622, virtual_normal_loss: 7.357, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73670/86850] [Epoch 25/30] [kitti]
loss: 9.811, time: 1.527546, eta: 5:35:33
metric_loss: 2.641, virtual_normal_loss: 7.216, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73680/86850] [Epoch 25/30] [kitti]
loss: 9.615, time: 1.527540, eta: 5:35:17
metric_loss: 2.521, virtual_normal_loss: 7.118, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73690/86850] [Epoch 25/30] [kitti]
loss: 9.613, time: 1.527537, eta: 5:35:02
metric_loss: 2.503, virtual_normal_loss: 7.071, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73700/86850] [Epoch 25/30] [kitti]
loss: 9.863, time: 1.527586, eta: 5:34:47
metric_loss: 2.548, virtual_normal_loss: 7.352, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
[Step 73710/86850] [Epoch 25/30] [kitti]
loss: 9.806, time: 1.527648, eta: 5:34:33
metric_loss: 2.616, virtual_normal_loss: 7.270, abs_rel: 0.823165, silog: 0.586482,
group0_lr: 0.000092, group1_lr: 0.000092,
The validation error does not decrease during training. How can I fix this? Thanks.
wuzht commented
Note that I did not alter any training settings
wuzht commented
Problem solved. I have to generate the dense depth maps from the sparse ones before training.