yizt/keras-faster-rcnn

NAN

wudi00 opened this issue · 9 comments

在训练自己的数据集时候会出现这种情况,loss=nan
ETA: 5:39:36 - loss: 18.5223 - rpn_bbox_loss: 11.2649 - rpn_class_loss: 2.0140 - rcnn_bbox_loss: 1.3287 - rcnn_class_loss: 3.9144 - regular_loss: 3.4468e-04 2/842 [..............................] - ETA: 2:54:25 - loss: 16.3641 - rpn_bbox_loss: 9.5598 - rpn_class_loss: 1.8381 - rcnn_bbox_loss: 1.2540 - rcnn_class_loss: 3.7118 - regular_loss: 3.4468e-04 3/842 [..............................] - ETA: 1:58:35 - loss: 14.9247 - rpn_bbox_loss: 8.3631 - rpn_class_loss: 1.7178 - rcnn_bbox_loss: 1.2046 - rcnn_class_loss: 3.6388 - regular_loss: 3.4468e-04 4/842 [..............................] - ETA: 1:30:52 - loss: 14.4434 - rpn_bbox_loss: 8.1286 - rpn_class_loss: 1.5437 - rcnn_bbox_loss: 1.1893 - rcnn_class_loss: 3.5815 - regular_loss: 3.4468e-04 5/842 [..............................] - ETA: 1:14:10 - loss: 14.2252 - rpn_bbox_loss: 7.9830 - rpn_class_loss: 1.5132 - rcnn_bbox_loss: 1.1992 - rcnn_class_loss: 3.5295 - regular_loss: 3.4468e-04 6/842 [..............................] - ETA: 1:02:52 - loss: 14.2799 - rpn_bbox_loss: 7.9283 - rpn_class_loss: 1.6065 - rcnn_bbox_loss: 1.2130 - rcnn_class_loss: 3.5317 - regular_loss: 3.4468e-04 7/842 [..............................] - ETA: 54:53 - loss: inf - rpn_bbox_loss: inf - rpn_class_loss: 1.5222 - rcnn_bbox_loss: 1.1792 - rcnn_class_loss: 3.4516 - regular_loss: 3.4468e-04 8/842 [..............................] - ETA: 48:57 - loss: nan - rpn_bbox_loss: nan - rpn_class_loss: 1.4091 - rcnn_bbox_loss: 1.1104 - rcnn_class_loss: 3.4306 - regular_loss: nan
请问这是哪里出问题了呢

yizt commented

@wudi00 您好,rpn_bbox_loss: inf;推测是由于anchors尺寸设置不合适,导致rpn_bbox_loss非常大;因为rpn边框回归这一步确保每个GT至少匹配一个正样本anchor(即使iou<0.7)

@yizt 那是不是修改一下生成anchors的尺寸就可以?或者有什么其他解决办法吗?

yizt commented

@wudi00 可以看看自定义度量rpn_gt_min_max_iou大小,就知道anchors尺寸是否合适

@yizt 我重新设置了anchor的尺寸,现在是训练几十组之后出现rpn_bbox_loss=nan,rpn_gt_min_max_iou大概是0.25左右,请问这样应该怎么调整anchor?

yizt commented

@wudi00 这个iou值太小了,anchor尺寸调整就是尽量跟你的GT边框尺寸匹配,好像没有办法说的更详细了。

@yizt 请问,想要把rpn中iou的阈值设小一点,iou阈值设置在哪里改啊,谢谢

yizt commented

@wudi00 target.py文件中,113行;不过调小并不能解决这个问题噢
image

@yizt 你好,调整了一下之后出现,rcnn_class_loss: nan,这个可能是哪里出问题了呢?

yizt commented

@wudi00 没有遇到过这个问题,无法定位问题所在,很抱歉!