Gradient explosion ?
huajiang123 opened this issue · 5 comments
Hi, jhljx. What a brief and beautiful implement ! But , when I run the code on my mechine, the predict values of the model were nan afer few epochs . And I found that the parameters of the model were updated by nan through back propagation. I am not sure if this is caused by gradient explosion. If it is , how to solve the problem. I have tried to decrease the learning rate and batch size , but it seems not work.
Hello, there. Thanks for your feedback. I want to confirm some details.
- What dataset do you use for running the GKT?
- Have you modified the hyper-parameter values of the model?
If you didn't use the dataset we provided, you need to check the data preprocessing part of the code to enable that GKT receive the correct input.
If you use the dataset in this repository, it seems confusing. Because we've tested the dataset in this repository for several times, and enable that this code can run successfully.
Hello, jhljx. Thanks for that you can help me to solve the problem. Firstly, I use the dataset 'skill_builder_data.csv' provided by repository. Secondly, I change the hyper-parameter 'batch_size' to 16 which is 128 default in your code, because I tried to run the code on gpu(2080 ti, 12g) but it reported out of memory error, so I decrease the batch_size。
Okay, I think maybe you can check whether your python software package versions are consistent with ours or not. The program can run succesfully on the 'skill_builder_data.csv' on our server.
Well, I hope you are right jhljx : ) , just because the problem of the versions of python software package may easily to solve. Lastly, thank you very very very much jhljx. God bless me to slove it : )
Great! Good luck!