Question about Varifocal loss

Question

Question about Varifocal loss

HAOCHENYE opened this issue 4 years ago · 7 comments

In the paper, the negtive weight of BCE loss is alpha*p^gamma. However, in varifocal_loss.py， the loss is implemented by:

focal_weight = target * (target > 0.0).float() +
alpha * (pred_sigmoid - target).abs().pow(gamma) *
(target <= 0.0).float()

The negtive weight is alpha(p-q)^gamma*, why?

Answer 1 · 2020-12-30T23:47:29.000Z

This is the initial version of implementation of VFL and I forgot to refine it.
alpha * (pred_sigmoid - target).abs().pow(gamma) * (target <= 0.0).float() actually equals to alpha * pred_sigmoid.pow(gamma) * (target == 0.0).float(), because there is a multiplier (target <= 0.0).float() in that formula and the target is always >= 0.

Answer 2 · 2020-12-31T01:21:35.000Z

This is the initial version of implementation of VFL and I forgot to refine it.
alpha * (pred_sigmoid - target).abs().pow(gamma) * (target <= 0.0).float() actually equals to alpha * pred_sigmoid.pow(gamma) * (target == 0.0).float(), because there is a multiplier (target <= 0.0).float() in that formula and the target is always >= 0.

You means alpha * pred_sigmoid.abs().pow(gamma) * (target <= 0.0).float() equals alpha * pred_sigmoid.pow(gamma) * (target == 0.0).float() or alpha * (pred_sigmoid - target).abs().pow(gamma) * (target <= 0.0).float() equals to alpha * pred_sigmoid.pow(gamma) * (target == 0.0).float()? I'd understand the situation if it is the former one.

According to paper, the negtive weight should be alpha * pred_sigmoid.abs().pow(gamma) * (target <= 0.0).float().Is the formular of paper current version?

Answer 3 · 2020-12-31T01:36:11.000Z

Hi, target is the IoU so it is always >= 0, which implies target <= 0 <=> target == 0.
In this way,
alpha * (pred_sigmoid - target).abs().pow(gamma) * (target <= 0.0).float() <=>
alpha * (pred_sigmoid - target).abs().pow(gamma) * (target == 0.0).float() <=>
alpha * pred_sigmoid.abs().pow(gamma) * (target == 0.0).float().

Answer 4 · 2020-12-31T01:37:45.000Z

Ohhh! Thanks, I understand it now.

Answer 5 · 2021-01-05T02:44:56.000Z

Hi @hyz-xmaster ,

I did not find the q in the red circle according to the code.
I can't understand the item above the green line. Since log(1-p) is used to predict negative samples, why it appears in the q>0 case? And anyway, I did not find the related implementation from the code. I just understand the code by the following way:

Looking forward to your reply, thanks.

Answer 6 · 2021-01-05T03:00:59.000Z

Hi @feiyuhuahuo,

target in the code represents q in that formula.
qlog(p)+(1-q)log(1-p) is the binary cross entropy loss, which is calculated by F.binary_cross_entropy_with_logits. When q = 0, qlog(p)+(1-q)log(1-p) reduces to log(1-p). When q > 0, it keeps unchanged.

Answer 7 · 2022-05-27T13:17:40.000Z

This is the initial version of implementation of VFL and I forgot to refine it.
alpha * (pred_sigmoid - target).abs().pow(gamma) * (target <= 0.0).float() actually equals to alpha * pred_sigmoid.pow(gamma) * (target == 0.0).float(), because there is a multiplier (target <= 0.0).float() in that formula and the target is always >= 0.

You means alpha * pred_sigmoid.abs().pow(gamma) * (target <= 0.0).float() equals alpha * pred_sigmoid.pow(gamma) * (target == 0.0).float() or alpha * (pred_sigmoid - target).abs().pow(gamma) * (target <= 0.0).float() equals to alpha * pred_sigmoid.pow(gamma) * (target == 0.0).float()? I'd understand the situation if it is the former one.

According to paper, the negtive weight should be alpha * pred_sigmoid.abs().pow(gamma) * (target <= 0.0).float().Is the formular of paper current version?

Hello, did you add your loss to yolov5? Judge which place needs to be adjusted?