Question on multiplying constant to sphereR_H, sphereR_N and sphereR_N

Question

Question on multiplying constant to sphereR_H, sphereR_N and sphereR_N

lizhenstat opened this issue 2 years ago · 2 comments

Hi, thanks for sharing the code of your great work on SphereFace Revived. I have a related question:
when using the three proposed normalization methods, why do you scaling a constant on the cross entropy loss
in sphereR_N,sphereR_H and sphereR_S.

Any help would be appreciated, thanks!

Answer 1 · 2023-04-14T12:46:08.000Z

lw is loss weight, controlling the loss scale.

Answer 2 · 2023-04-16T06:40:30.000Z

thanks a lot, I found an interesting answer why scaling loss does affect the training result.
It seems that scaling the loss under SGD and no regularization equals scaling the learning rate.