franciscoliu/SKU

the value of the loss function appears to be nan

Opened this issue · 1 comments

Hi! Thanks for the amazing work!
When I reproduced the code, the value of the loss function appears to be nan during the finetuning process:

batch: 0, GD_loss: 8.13, RD_loss: 7.95, reversed_kl_loss: nan, combined_loss: nan, 
batch: 1, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 2, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 3, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 4, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 5, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 6, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 7, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 8, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 
batch: 9, GD_loss: nan, RD_loss: nan, reversed_kl_loss: nan, combined_loss: nan, 

I did not change the hyperparameters provided in the demo except for the learning rate. I have tried to increase or decrease the learning rate, but none of it helped.

Hi, thank you for the question. It is really hard to tell from the output you are given. I have reran the code on my side and it did not appear the problem. Have you tried to comment out each loss and see if they works fine individually?