the value of the loss function appears to be nan

Question

the value of the loss function appears to be nan

Opened this issue 3 months ago · 1 comments

Hi! Thanks for the amazing work!
When I reproduced the code, the value of the loss function appears to be nan during the finetuning process:

batch: 0, GD_loss: 8.13, RD_loss: 7.95, reversed_kl_loss: nan, combined_loss: nan, 
batch: 1, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 2, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 3, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 4, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 5, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 6, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 7, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 8, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 9, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan,

I did not change the hyperparameters provided in the demo except for the learning rate. I have tried to increase or decrease the learning rate, but none of it helped.

Answer 1 · 2024-06-20T03:05:58.000Z

Hi, thank you for the question. It is really hard to tell from the output you are given. I have reran the code on my side and it did not appear the problem. Have you tried to comment out each loss and see if they works fine individually?