dawn-ech/YOLC

梯度消失

Opened this issue · 2 comments

2024-09-13 10:32:11,572 - mmdet - INFO - Epoch [4][31250/31563] lr: 2.500e-03, eta: 10 days, 10:41:12, time: 0.403, data_time: 0.004, memory: 8149, loss_center_heatmap: nan, loss_xywh_coarse: nan, loss_xywh_coarse_l1: nan, loss_xywh_refine: nan, loss_xywh_refine_l1: nan, loss: nan, grad_norm: nan
2024-09-13 10:32:31,605 - mmdet - INFO - Epoch [4][31300/31563] lr: 2.500e-03, eta: 10 days, 10:38:22, time: 0.401, data_time: 0.004, memory: 8149, loss_center_heatmap: nan, loss_xywh_coarse: nan, loss_xywh_coarse_l1: nan, loss_xywh_refine: nan, loss_xywh_refine_l1: nan, loss: nan, grad_norm: nan
2024-09-13 10:33:31,678 - mmdet - INFO - Epoch [4][31350/31563] lr: 2.500e-03, eta: 10 days, 10:42:54, time: 1.201, data_time: 0.004, memory: 8149, loss_center_heatmap: nan, loss_xywh_coarse: nan, loss_xywh_coarse_l1: nan, loss_xywh_refine: nan, loss_xywh_refine_l1: nan, loss: nan, grad_norm: nan
2024-09-13 10:34:57,915 - mmdet - INFO - Epoch [4][31400/31563] lr: 2.500e-03, eta: 10 days, 10:52:13, time: 1.725, data_time: 0.004, memory: 8149, loss_center_heatmap: nan, loss_xywh_coarse: nan, loss_xywh_coarse_l1: nan, loss_xywh_refine: nan, loss_xywh_refine_l1: nan, loss: nan, grad_norm: nan
为什么运行到一半的时候,梯度和损失函数的值就消失了,然后每一轮结果都很低,作者有这样的情况吗

作者您好,此外,运行起来显存大概需要15个G左右,是否正常呢

原始学习率是对应4卡*2bs/卡=8bs,需要针对自己的batch size进行线性缩放。如果你的总batch size是2,需要将学习率除以4。显存应该是正常的