Ignore padding tokens when using Translation Task + `padding='max_length'`

Question

Ignore padding tokens when using Translation Task + `padding='max_length'`

SeanNaren opened this issue 2 years ago · 6 comments

🐛 Bug

When using the Translation Task, we need to ensure that we skip padding tokens within the loss calculation. Currently we do not replace the padding with -100, which could be deterimental.

Answer 1 · 2022-08-15T16:31:56.000Z

Hi @SeanNaren, I'm looking to contribute in ML-Software projects and have been using pytorch-lightning myself (read: I'm a fan!). Can you tell me where to get started for this issue? I'd like to scope if I can devote some of my time fixing this one.

Answer 2 · 2022-09-14T22:29:26.000Z

@SeanNaren would you have some points on how/where to start? 🐰

Answer 3 · 2022-10-07T08:45:13.000Z

Hi @SeanNaren, @Borda, I think here is what is being asked to be modified.

I referred to this example.
In here, we use TranslationTransformer for the training purpose, and it inherits from Seq2SeqTransformer. If we see this line, we see that the output is loss, logits, however here the loss is calculated taking the padding token into account.

I found the answer about how to solve it, and it is described by the Hugging Face community here.

So, I guess the change to be made is (in simple language), in the same line, i.e here:

Obtain the loss, logits from the common step
Initialize the Cross-Entropy loss with ignore_index = -100.
Make the indexes of the target tokens which are 0 to -100
Calculate the final loss and then perform the steps as usual.

Hope this helps in solving the issue.

Answer 4 · 2022-11-07T09:27:33.000Z

@spranjal25 are you fine with @uakarsh suggestion?

Answer 5 · 2022-11-07T10:00:30.000Z

@spranjal25 are you fine with @uakarsh suggestion?

Yes, that's very helpful. I think I can get started on it. Will pick this up ASA I get some free time, are we looking at a timeline here though @Borda?

Answer 6 · 2022-11-07T19:56:42.000Z

not a special rush :)