is there any ignore_index ability in the loss calculation?

Is there a way to incorporate an ignore_index ability like cross_entropy in pytorch? Right now the default is a sequence packing so I guess taking the loss across the whole sequence makes sense (not much padding then). I added an ability to remove the sequence packing, and just has a padding to fill up the context in the samples. But I'd like to ignore those in the loss calulation.

I was curious if anybody knew of this feature or has implemented it themselves? Thanks!

gpt-neox/megatron/utils.py

Lines 102 to 104 in 7267a74

    
           loss_mask = torch.ones(data.size(), dtype=torch.float, device=data.device) 
        
           if eod_mask_loss: 
        
               loss_mask[data == eod_token] = 0.0

the loss_mask can be set to 0 to ignore loss for certain token positions! (I can't recall if it is right-shifted or not compared to input token ids though.)

@exnx -- Please reopen if this doesn't answer your question! :)

	loss_mask = torch.ones(data.size(), dtype=torch.float, device=data.device)
	if eod_mask_loss:
	loss_mask[data == eod_token] = 0.0