keiohta/tf2rl

Reward nan

haoyu-x opened this issue · 15 comments

Hi,
Thanks for sharing this great project!

when I run gail/ gaifo on Pendulum-v0, sometimes, return will become nan, I'm wondering why.

(venv) haoyux@haoyux-ThinkPad:~/tf2rl-master/examples$ python3 run_gaifo_ddpg.py --save-test-path --save-test-movie --expert-path-dir /home/haoyux/tf2rl-master/examples/results/20200725T223711.222453_SAC_ --test-interval 20000

01:32:03.873 [INFO] (irl_trainer.py:74) Total Epi:   157 Steps:   31400 Episode Steps:   200 Return:   nan FPS: 58.40
01:32:07.445 [INFO] (irl_trainer.py:74) Total Epi: 158 Steps: 31600 Episode Steps: 200 Return: nan FPS: 56.01
01:32:11.351 [INFO] (irl_trainer.py:74) Total Epi: 159 Steps: 31800 Episode Steps: 200 Return: nan FPS: 53.48
01:32:14.457 [INFO] (irl_trainer.py:74) Total Epi: 160 Steps: 32000 Episode Steps: 200 Return: nan FPS: 64.44

(I tried a harder environment, robosuite will throw a nan reward always, and execution will be killed)

Hi @haoyu-x , thanks for reporting the error!
I'm afraid I won't work on this immediately because I've been busy for a paper deadline in a week, but I'll fix it after that.

sure, good luck to your CoRL 👍

Hi @keiohta , I assumed that's because we need to normalize the reward of GAIL/GAIfO, but I'm not sure. Will you have time these days to look into this issue? Thanks a lot!

so the weird "nan" bug goes like this: I'm using tf2rl GAIfO for robosuite. during training, the discriminator output becomes nan suddenly. however, the input of the discriminator is normal. so using the nan output(the reward) to update the policy leads to a nan action, which leads to a mujoco error. When I use tf2rl GAIL/GAIfO for gym pendulum, the nan error sometimes happens, sometimes not. But for robosuite, it happens always. it cost me too much time and I have no idea how to fix it. I'm wondering if you can offer me some help. Thanks!

Hi @haoyu-x , I'm going to look into this issue. In the mean time, can you try with enable_sn=True when creating the GAIL/GAIfO? One possible potential problem is that GAN like training is unstable, and Spectral Normalization certainly helps stabilize the training (see #14 that compares w/ w/o SN #14 (comment)).

Thanks for trying to use SN. By the way, what’s your TensorFlow version? I tested on 2.0.0, and have not supported later version (but going to do soon).

thanks for your advice, so for now I'm able to run enable-sn with tf 2.0.0. but not sure if the nan error will happen again, will update later.

Thanks @haoyu-x for trying tf2.0.0!! I'll do my best for supporting tf2.2.0, but might take time.
Anyway, please let me know if you have any kind of error. That definitely helps improve this library.

Glad to hear that!
Please remain this issue open till you can successfully run this on tf2.2 (or later). I'll work on that.

Hi @haoyu-x , I think you can now run the original script on tf2.2 or later (PR #95 ).
Please confirm your script works without any problem, and close this issue.

I think this issue has been resolved by the PR in my previous comment.
Also, you should be able to run on tf2.3 on latest master.

Please reopen this issue if you encounter the same problem.