elttaes/Revisiting-PLMs

Does the padding affect the training of SSP?

huangtinglin opened this issue · 4 comments

Hi. Thanks for your interesting work!

I noticed that you used the pad token to align sequences of different lengths and MSAs. I'm curious if the resulting representations with the padding token are consistent with the representations without the padding token. Additionally, I would like to know if applying a convolution network on the representations with the padding token would affect training and prediction. Looking forward to your response!

Hi,
The padding part will be set to 0 during calculation, so whether there is padding or not, the representation should be exactly the same.
https://github.com/facebookresearch/esm/blob/main/esm/model/esm1.py#L121

Thanks for your reply. Actually, I found that the padding in the depth of MSAs will lead to an inconsistent result due to the rescaling factor. I have posted this issue on the ESM's repo: facebookresearch/esm#491.

Besides, it seems the implementation doesn't add the padding mask in calculating the loss function and accuracy. Doesn't that affect model training?

Hi:
The padding on the depth of MSAs surely influences the result. Maybe I also need to do some experiments to confirm the impact on the final result. Thank you for your careful discovery.
Adding the padding mask on the loss function is right, it may influence the a little on results. I will fix it.
Thank you.

I have no more questions. Thanks for your interesting work again!