Questions about the NLL loss
AlonzoLeeeooo opened this issue · 0 comments
AlonzoLeeeooo commented
Hi @XiangLi1999 ,
Thanks for the amazing work! I have encountered some questions while implementing DiffusionLM:
- During my experiments, I notice that
decoder_nll
(CE loss essentially) equals to zero for a period of training (about8k
steps). Thendecoder_nll
occurs with increasing values. Is this phenomenon normal for the training of DiffusionLM? How willdecoder_nll
perform is the training is implemented correctly? - The second question is about
tT_loss
.tT_loss
equals to constant value during training (the value is about 1.3e-7). This happens when I try to implement a cosine annealing and warmup upon the training learning rate. However, when I use constant learning rate or linear decay strategy.tT_loss
starts decreasing. I am now confused about which curve should be correct for training DiffusionLM. Could you explain a little bit about how the loss curve oftT_loss
would occur if DIffusionLM is trained correctly?
Thanks you in advance for paying attention to this issue from your busy schedule. It would do me a big favor if you could help me out with the aforementioned questions.
Best,