About the loss function problem
Closed this issue · 2 comments
Pefect96 commented
In the function of
Line 81 in cb69f28
nn.L1Loss
is inconsistent with Eq.3 in the paper, and Eq. 3 does not include the nn.L1Loss
.
bravePinocchio commented
Why isn't this line of code being used instead of the twenty-ninth line of code? Beta is power in the paper.
Lines 28 to 29 in cb69f28
udion commented
For stability during training, beta in power leads to unstable training hitting nans, so we used Taylor series approximation, explained in Appendix of the paper here: https://arxiv.org/pdf/2307.00398 (Eq. 16 in appendix A)