the value of the loss function appears to be nan
Opened this issue · 1 comments
CurryxIaoHu commented
Hi! Thanks for the amazing work!
When I reproduced the code, the value of the loss function appears to be nan during the finetuning process:
batch: 0, GD_loss: 8.13, RD_loss: 7.95, reversed_kl_loss: nan, combined_loss: nan,
batch: 1, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan,
batch: 2, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan,
batch: 3, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan,
batch: 4, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan,
batch: 5, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan,
batch: 6, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan,
batch: 7, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan,
batch: 8, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan,
batch: 9, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan,
I did not change the hyperparameters provided in the demo except for the learning rate. I have tried to increase or decrease the learning rate, but none of it helped.
franciscoliu commented
Hi, thank you for the question. It is really hard to tell from the output you are given. I have reran the code on my side and it did not appear the problem. Have you tried to comment out each loss and see if they works fine individually?