Logits shift in loss computation

Question

Opened this issue 3 months ago · 1 comments

While the computing the loss L136, shouldn't the logits and targets be rolled to account for next token prediction?

Answer 1 · 2024-07-14T21:54:59.000Z

Edit- I see that you took care of it while preparing the targets.