jzhang38/EasyContext

Logits shift in loss computation

Opened this issue · 1 comments

While the computing the loss L136, shouldn't the logits and targets be rolled to account for next token prediction?

Similar to https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1092

Edit- I see that you took care of it while preparing the targets.