Lightning-Universe/lightning-transformers

Ignore padding tokens when using Translation Task + `padding='max_length'`

SeanNaren opened this issue ยท 6 comments

๐Ÿ› Bug

When using the Translation Task, we need to ensure that we skip padding tokens within the loss calculation. Currently we do not replace the padding with -100, which could be deterimental.

Hi @SeanNaren, I'm looking to contribute in ML-Software projects and have been using pytorch-lightning myself (read: I'm a fan!). Can you tell me where to get started for this issue? I'd like to scope if I can devote some of my time fixing this one.

Borda commented

@SeanNaren would you have some points on how/where to start? ๐Ÿฐ

Hi @SeanNaren, @Borda, I think here is what is being asked to be modified.

I referred to this example.
In here, we use TranslationTransformer for the training purpose, and it inherits from Seq2SeqTransformer. If we see this line, we see that the output is loss, logits, however here the loss is calculated taking the padding token into account.

I found the answer about how to solve it, and it is described by the Hugging Face community here.

So, I guess the change to be made is (in simple language), in the same line, i.e here:

  1. Obtain the loss, logits from the common step
  2. Initialize the Cross-Entropy loss with ignore_index = -100.
  3. Make the indexes of the target tokens which are 0 to -100
  4. Calculate the final loss and then perform the steps as usual.

Hope this helps in solving the issue.

Borda commented

@spranjal25 are you fine with @uakarsh suggestion?

@spranjal25 are you fine with @uakarsh suggestion?

Yes, that's very helpful. I think I can get started on it. Will pick this up ASA I get some free time, are we looking at a timeline here though @Borda?

Borda commented

not a special rush :)