About Alpha parameter in focal loss

Question

About Alpha parameter in focal loss

Closed this issue 6 years ago · 7 comments

From the paper, alpha's are weights for each example. So why alpha=0.25 is kept? Does this mean giving equal weight to all the examples?

I may be wrong but this what I understood from the paper.

Answer 1 · 2019-01-21T16:30:24.000Z

No, It doesn't give equal weight to all the examples.

The focusing parameter γ(gamma) smoothly adjusts the rate at which easy examples are down-weighted.
When γ = 0, focal loss is equivalent to categorical cross-entropy, and as γ is increased the effect of the modulating factor is likewise increased (γ = 2 works best in experiments).

α(alpha): balances focal loss, yields slightly improved accuracy over the non-α-balanced form.

I suggest you to read the paper much better ;-)

Answer 2 · 2019-01-21T16:57:12.000Z

In the paper, the balanced form has alpha_t*(1-pt)^(gamma)*(log(pt)).

I am saying that in the equation it is alpha_t not alpha meaning the alpha_t is different for each example and not a constant. In the above section(balanced cross entropy) there also alpha_t was different for each example. I think they are saying that when we use weighted focal loss we get slightly better accuracy.

Note: Another thing I want to mention is alpha = 1 and alpha = 0.25 doesn't make any difference because you are just scaling the loss function and the optimal weights of the model will be the same for both the cases then how can it give better accuracy ?

Answer 3 · 2019-01-22T09:10:53.000Z

For example, in Binary case alpha is the weighting factor, 1 for class 1 and 1-alpha for class 0, so alpha balances the importance of positive/negative examples.
So you have to choose only an alpha value.

Answer 4 · 2019-01-22T15:28:56.000Z

I saw the code and thought that you are multiplying with alpha with whole equation but you are multiplying alpha and 1-alpha.

My bad !!

thanks for the reply.

Answer 5 · 2020-09-18T20:40:41.000Z

Can I define multiple alphas on the multi-class problem?

Answer 6 · 2021-11-09T23:38:30.000Z

In focal paper, it says
In practice α may be set by inverse class frequency or treated as a hyperparameter to set by cross validation.
So for each class, I guess you compute occurrence in training set and do the inverse.

Answer 7 · 2022-02-21T17:48:11.000Z

How to set α by inverse class frequency? Is it like class_weights = dict(zip(np.unique(y_train), class_weight.compute_class_weight('balanced', np.unique(y_train),
y_train))) ?