is there any ignore_index ability in the loss calculation?
exnx opened this issue · 2 comments
Is there a way to incorporate an ignore_index
ability like cross_entropy in pytorch? Right now the default is a sequence packing so I guess taking the loss across the whole sequence makes sense (not much padding then). I added an ability to remove the sequence packing, and just has a padding to fill up the context in the samples. But I'd like to ignore those in the loss calulation.
I was curious if anybody knew of this feature or has implemented it themselves? Thanks!
Lines 102 to 104 in 7267a74
the loss_mask
can be set to 0 to ignore loss for certain token positions! (I can't recall if it is right-shifted or not compared to input token ids though.)
@exnx -- Please reopen if this doesn't answer your question! :)