microsoft/ProphetNet

GENIE: decoder_nll loss is always equal to 0 for training from scratch

BaohaoLiao opened this issue · 2 comments

Hi @qiweizhen,

I try to reproduce your reported result of the training from scratch for XSum. However, the decoder_nll loss is always equal to 0, which is quite weird since it's cross-entropy loss.

If I load your pre-trained model, it is not equal to 0. Do you know the reason?

Hi, I'm trying to reproduce the result of XSUM from scratch too, using recommended parameters in README. But I cannot reproduce the ROUGE score, which is much lower than the score reported in the paper. Any suggestions? @qiweizhen , thank you!

Diffusion models w/o pre-training often require more training steps. If you want to reproduce the results from scratch, you need to set the --lr_anneal_steps to more (e.g. Xsum 400k steps). We hope this suggestion can help you.
We have noticed that our description of training from scratch in README have caused some misunderstandings, and we will update and correct them in the next version. Thank you for your feedback.