Question about `detach`
zen-d opened this issue · 7 comments
Hi,
Thanks for the nice work. For calculating the IoU target, I think the detach
should be used on the "predicted items". Specifically, in my view, for these two lines with detach applied, the detach should be moved to their previous lines, i.e. line 408 and 424. Correct me if I miss something.
https://github.com/hyz-xmaster/VarifocalNet/blob/master/mmdet/models/dense_heads/vfnet_head.py#L409
https://github.com/hyz-xmaster/VarifocalNet/blob/master/mmdet/models/dense_heads/vfnet_head.py#L425
Could you please explain it more? Thanks.
Hi, we would like the gradient of losses to flow to pos_decoded_bbox_preds
and pos_decoded_bbox_preds_refine
so as to optimize the predictions, so we don't detach them. Conversely, we don't want the gradient to back-propagate to pos_decoded_target_preds
.
@hyz-xmaster Thanks for your prompt reply.
For pos_decoded_target_preds
, I guess it may not be a must. It seems common in machine learning that the gradient does not flow back to the "target".
For pos_decoded_bbox_preds
and pos_decoded_bbox_preds_refine
, I understand now that you intended to do that, but could you please share more insights about this gradient behavior? As also mentioned in your paper, VFL is somewhat like GFL, in which QFL uses detached localization prediction to calculate IoU for classification loss, refer to mmdet implementation here. According to my experience, if detach
is removed in this line for QFL, final AP drops significantly. Could you please comment on the different design choices between VFL and QFL? Thanks.
@hyz-xmaster any update?
As I see in the QFL implementation, it uses pos_decoded_bbox_preds
to compute score
and then uses the score
here. It seems QFL uses the score
and label
to compute the target for the cls_score
, which has the similar function of pos_decoded_target_preds
. You should not propagate the gradient to targets.
@hyz-xmaster Exactly, I see that QFL does not and should not propagate gradients to pos_decoded_bbox_preds
. Then back to my above question, why did VFL choose to propagate gradients to pos_decoded_bbox_preds
, without hurting the performance?
@hyz-xmaster Exactly, I see that QFL does not and should not propagate gradients to
pos_decoded_bbox_preds
. Then back to my above question, why did VFL choose to propagate gradients topos_decoded_bbox_preds
, without hurting the performance?
See this line, we detach the iou_targets_ini
calculated from pos_decoded_bbox_preds
, and this line.
We only propagate gradients to pos_decoded_bbox_preds
here.
@zen-d
Though I only took a cursory look at your question, I think you've made a misconception. The line you quoted refers to the calculation of box regression loss, in which the gradient of predicted box regression value should not be detached. What'more, varifocal loss does not apply to the calculation of regression loss, it serves as a cls loss. So the only thing we should take care is to detach the gradient of cls_target in this line.