NAN
wudi00 opened this issue · 9 comments
在训练自己的数据集时候会出现这种情况,loss=nan
ETA: 5:39:36 - loss: 18.5223 - rpn_bbox_loss: 11.2649 - rpn_class_loss: 2.0140 - rcnn_bbox_loss: 1.3287 - rcnn_class_loss: 3.9144 - regular_loss: 3.4468e-04 2/842 [..............................] - ETA: 2:54:25 - loss: 16.3641 - rpn_bbox_loss: 9.5598 - rpn_class_loss: 1.8381 - rcnn_bbox_loss: 1.2540 - rcnn_class_loss: 3.7118 - regular_loss: 3.4468e-04 3/842 [..............................] - ETA: 1:58:35 - loss: 14.9247 - rpn_bbox_loss: 8.3631 - rpn_class_loss: 1.7178 - rcnn_bbox_loss: 1.2046 - rcnn_class_loss: 3.6388 - regular_loss: 3.4468e-04 4/842 [..............................] - ETA: 1:30:52 - loss: 14.4434 - rpn_bbox_loss: 8.1286 - rpn_class_loss: 1.5437 - rcnn_bbox_loss: 1.1893 - rcnn_class_loss: 3.5815 - regular_loss: 3.4468e-04 5/842 [..............................] - ETA: 1:14:10 - loss: 14.2252 - rpn_bbox_loss: 7.9830 - rpn_class_loss: 1.5132 - rcnn_bbox_loss: 1.1992 - rcnn_class_loss: 3.5295 - regular_loss: 3.4468e-04 6/842 [..............................] - ETA: 1:02:52 - loss: 14.2799 - rpn_bbox_loss: 7.9283 - rpn_class_loss: 1.6065 - rcnn_bbox_loss: 1.2130 - rcnn_class_loss: 3.5317 - regular_loss: 3.4468e-04 7/842 [..............................] - ETA: 54:53 - loss: inf - rpn_bbox_loss: inf - rpn_class_loss: 1.5222 - rcnn_bbox_loss: 1.1792 - rcnn_class_loss: 3.4516 - regular_loss: 3.4468e-04 8/842 [..............................] - ETA: 48:57 - loss: nan - rpn_bbox_loss: nan - rpn_class_loss: 1.4091 - rcnn_bbox_loss: 1.1104 - rcnn_class_loss: 3.4306 - regular_loss: nan
请问这是哪里出问题了呢
@wudi00 您好,rpn_bbox_loss: inf;推测是由于anchors尺寸设置不合适,导致rpn_bbox_loss非常大;因为rpn边框回归这一步确保每个GT至少匹配一个正样本anchor(即使iou<0.7)