ydwen/opensphere

Question on multiplying constant to sphereR_H, sphereR_N and sphereR_N

lizhenstat opened this issue · 2 comments

Hi, thanks for sharing the code of your great work on SphereFace Revived. I have a related question:
when using the three proposed normalization methods, why do you scaling a constant on the cross entropy loss
in sphereR_N,sphereR_H and sphereR_S.

Any help would be appreciated, thanks!

ydwen commented

lw is loss weight, controlling the loss scale.

thanks a lot, I found an interesting answer why scaling loss does affect the training result.
It seems that scaling the loss under SGD and no regularization equals scaling the learning rate.