Logits shift in loss computation
Opened this issue · 1 comments
shivamag125 commented
While the computing the loss L136, shouldn't the logits and targets be rolled to account for next token prediction?
shivamag125 commented
Edit- I see that you took care of it while preparing the targets.