Question about `detach`

Question

Question about `detach`

zen-d opened this issue 2 years ago · 7 comments

Hi,

Thanks for the nice work. For calculating the IoU target, I think the detach should be used on the "predicted items". Specifically, in my view, for these two lines with detach applied, the detach should be moved to their previous lines, i.e. line 408 and 424. Correct me if I miss something.
https://github.com/hyz-xmaster/VarifocalNet/blob/master/mmdet/models/dense_heads/vfnet_head.py#L409
https://github.com/hyz-xmaster/VarifocalNet/blob/master/mmdet/models/dense_heads/vfnet_head.py#L425

Could you please explain it more? Thanks.

Answer 1 · 2022-08-29T12:01:46.000Z

Hi, we would like the gradient of losses to flow to pos_decoded_bbox_preds and pos_decoded_bbox_preds_refine so as to optimize the predictions, so we don't detach them. Conversely, we don't want the gradient to back-propagate to pos_decoded_target_preds.

Answer 2 · 2022-08-30T06:04:11.000Z

@hyz-xmaster Thanks for your prompt reply.
For pos_decoded_target_preds, I guess it may not be a must. It seems common in machine learning that the gradient does not flow back to the "target".
For pos_decoded_bbox_preds and pos_decoded_bbox_preds_refine, I understand now that you intended to do that, but could you please share more insights about this gradient behavior? As also mentioned in your paper, VFL is somewhat like GFL, in which QFL uses detached localization prediction to calculate IoU for classification loss, refer to mmdet implementation here. According to my experience, if detach is removed in this line for QFL, final AP drops significantly. Could you please comment on the different design choices between VFL and QFL? Thanks.

Answer 3 · 2022-09-15T02:50:26.000Z

@hyz-xmaster any update?

Answer 4 · 2022-09-15T04:29:29.000Z

As I see in the QFL implementation, it uses pos_decoded_bbox_preds to compute score and then uses the score here. It seems QFL uses the score and label to compute the target for the cls_score, which has the similar function of pos_decoded_target_preds. You should not propagate the gradient to targets.

Answer 5 · 2022-09-15T05:39:46.000Z

@hyz-xmaster Exactly, I see that QFL does not and should not propagate gradients to pos_decoded_bbox_preds. Then back to my above question, why did VFL choose to propagate gradients to pos_decoded_bbox_preds, without hurting the performance?

Answer 6 · 2022-09-15T06:25:17.000Z

@hyz-xmaster Exactly, I see that QFL does not and should not propagate gradients to pos_decoded_bbox_preds. Then back to my above question, why did VFL choose to propagate gradients to pos_decoded_bbox_preds, without hurting the performance?

See this line, we detach the iou_targets_ini calculated from pos_decoded_bbox_preds, and this line.

We only propagate gradients to pos_decoded_bbox_preds here.

Answer 7 · 2022-11-12T13:11:43.000Z

@zen-d
Though I only took a cursory look at your question, I think you've made a misconception. The line you quoted refers to the calculation of box regression loss, in which the gradient of predicted box regression value should not be detached. What'more, varifocal loss does not apply to the calculation of regression loss, it serves as a cls loss. So the only thing we should take care is to detach the gradient of cls_target in this line.