longcw/yolo2-pytorch

Train an mAP 0.71 model by modifying 'mask' & 'scale'

cory8249 opened this issue · 5 comments

I traced YOLOv2 C code last few days, I think there is a misunderstanding about 'mask' and 'scale'.

In this pytorch repo, the mask is used for loss function. It helps the network to focus on correct anchor boxes, instead of punishing other irrelevant boxes.
self.iou_loss = nn.MSELoss(size_average=False)(iou_pred * iou_mask, _ious * iou_mask) / num_boxes

So how to calculate right scale_mask ?

YOLO's mask is based on predicted objectness(0~1) for the box
So, if the box's predicted objectness is high (e.g. 0.9). But there are no ground-truth in that position. It should be punished. The punishment = noobject_scale * (0 - predicted objectness)
l.delta[obj_index] = l.noobject_scale * (0 - l.output[obj_index]);
Hence, this function help network learns to give reasonable confidence on the box

However, in this repo
_iou_mask[best_ious <= cfg.iou_thresh] = cfg.noobject_scale
dose not consider objectness. It punishes every unqualified box with the same value. Hence the detector learn very poor about objectness

Here is the most obvious one, other 'mask' and 'scale' are also implemented wrong way. And acutally YOLO has more complicated policy about these scale_mask. (some if-else conditions). I also find that YOLO's the loss is calculated before 'exp() and log(), not after.

By fixing scale_mask bug, VOC07 test mAP (trained on VOC07+12 trainval) increases from 0.67 to 0.71. Which is much closer to yolo-voc-weights.h5 (0.7221)

You can refer to my code darknet_v2.py. Though I am still debugging, not completed yet. Just for pointing out what I found.

Thank you!

@cory8249
In my understanding, the l.delta in darknet source code is the minus derivative of the loss with respect to the input value.

If the mask for those positions without ground truth boxes is just l.noobject_scale, then the loss is defined as l.noobject_scale / 2 * (pred_iou - gt_iou) ^ 2, and the gt_iou is 0. In this case, the minus derivative with respect to pred_iou should be: l.noobject_scale * (0 - pred_iou), which is consistent with the darknet source code: l.delta[obj_index] = l.noobject_scale * (0 - l.output[obj_index]).

From the equation that loss = l.noobject_scale / 2 * (pred_iou - gt_iou) ^ 2, the punishment for those positions without gt boxes depends on both the noobject_scale and the pred_iou. The minus derivative l.noobject_scale * (0 - pred_iou) also shows this point. Thus when pred_iou goes greater (from 0 to 1), the punishment already goes greater, and it is not necessary to incorporate pred_iou to the mask part to improve the punishment.

So I think the previous implementation _iou_mask[best_ious < cfg['iou_thresh']] = cfg['noobject_scale'] * 1 is reasonable and consistent with darknet source code.

Hi @cory8249 ,
I find out your yolo2-pytorch codes in your repository. But I find it hard to compare your code with original longcw's version.
Can you please list all the modification you do to improve the mAP to 0.71.
B.T.W is darknet_training_v3.py that works to obtain 0.71 mAP ?

@JesseYang Your argument makes sense to me, and I tend to agree with it, but when I look in the current source code I see that @cory8249's version is being used. Why is this? It seems like iou_mask should simply be cfg.noobject_scale wherever there is no object. Is this wrong?

I agree with @JesseYang 's points and in order to meet with the original code, I guess it should be
_iou_mask[best_ious < cfg['iou_thresh']] = math.sqrt(0.5*cfg['noobject_scale']) (and as well as for high iou anchors).

I'm just doing an experiment to test such settings.
The results is quite similar (and a little bit better with 416*416 input) with what I got from 'master' version, which is 72.3% currently.