Question on multiplying constant to sphereR_H, sphereR_N and sphereR_N
lizhenstat opened this issue · 2 comments
lizhenstat commented
ydwen commented
lw is loss weight, controlling the loss scale.
lizhenstat commented
thanks a lot, I found an interesting answer why scaling loss does affect the training result.
It seems that scaling the loss under SGD and no regularization equals scaling the learning rate.