yuantn/MI-AOD

你好,想问一下关于训练时稳定性的问题

Overcautious opened this issue · 1 comments

我是在自己的数据集上进行训练的,在经过第一轮训练时,各项损失函数都是正常,并且在val集上有了正常了mAP结果。但是在第二轮训练时,损失函数突然不正常了,这就导致了后续的整体训练都崩了,包括评估阶段的mAP

2022-07-09 19:33:35,677 - mmdet - INFO - Epoch [3][700/5885]	lr: 1.000e-03, eta: 0:38:52, time: 0.542, data_time: 0.003, memory: 1256, l_det_cls: 0.1581, l_det_loc: 0.4424, l_imgcls: 0.3133, L_det: 0.9138
2022-07-09 19:34:01,040 - mmdet - INFO - Epoch [3][750/5885]	lr: 1.000e-03, eta: 0:38:31, time: 0.507, data_time: 0.002, memory: 1256, l_det_cls: 0.1491, l_det_loc: 0.4257, l_imgcls: 0.3133, L_det: 0.8881
2022-07-09 19:34:30,435 - mmdet - INFO - Epoch [3][800/5885]	lr: 1.000e-03, eta: 0:38:11, time: 0.588, data_time: 0.002, memory: 1256, l_det_cls: 0.1511, l_det_loc: 0.4848, l_imgcls: 0.3133, L_det: 0.9492
2022-07-09 19:34:57,035 - mmdet - INFO - Epoch [3][850/5885]	lr: 1.000e-03, eta: 0:37:50, time: 0.532, data_time: 0.003, memory: 1256, l_det_cls: nan, l_det_loc: nan, l_imgcls: nan, L_det: nan
2022-07-09 19:35:24,563 - mmdet - INFO - Epoch [3][900/5885]	lr: 1.000e-03, eta: 0:37:30, time: 0.551, data_time: 0.002, memory: 1256, l_det_cls: nan, l_det_loc: nan, l_imgcls: nan, L_det: nan
2022-07-09 19:35:49,460 - mmdet - INFO - Epoch [3][950/5885]	lr: 1.000e-03, eta: 0:37:08, time: 0.498, data_time: 0.002, memory: 1256, l_det_cls: nan, l_det_loc: nan, l_imgcls: nan, L_det: nan

// 后续的mAP
-------+-------+------+--------+-------+
| class | gts   | dets | recall | ap    |
+-------+-------+------+--------+-------+
| text  | 34200 | 0    | 0.000  | 0.000 |
| link  | 31581 | 0    | 0.000  | 0.000 |
+-------+-------+------+--------+-------+
| mAP   |       |      |        | 0.000 |
+-------+-------+------+--------+-------+

到底是哪里出了问题呢?

已解决