Reward nan

Question

Reward nan

haoyu-x opened this issue 4 years ago · 15 comments

Hi,
Thanks for sharing this great project!

when I run gail/ gaifo on Pendulum-v0, sometimes, return will become nan, I'm wondering why.

(venv) haoyux@haoyux-ThinkPad:~/tf2rl-master/examples$ python3 run_gaifo_ddpg.py --save-test-path --save-test-movie --expert-path-dir /home/haoyux/tf2rl-master/examples/results/20200725T223711.222453_SAC_ --test-interval 20000

01:32:03.873 [INFO] (irl_trainer.py:74) Total Epi:   157 Steps:   31400 Episode Steps:   200 Return:   nan FPS: 58.40

01:32:07.445 [INFO] (irl_trainer.py:74) Total Epi:   158 Steps:   31600 Episode Steps:   200 Return:   nan FPS: 56.01

01:32:11.351 [INFO] (irl_trainer.py:74) Total Epi:   159 Steps:   31800 Episode Steps:   200 Return:   nan FPS: 53.48

01:32:14.457 [INFO] (irl_trainer.py:74) Total Epi:   160 Steps:   32000 Episode Steps:   200 Return:   nan FPS: 64.44

(I tried a harder environment, robosuite will throw a nan reward always, and execution will be killed)

Answer 1 · 2020-07-26T01:08:07.000Z

Hi @haoyu-x , thanks for reporting the error!
I'm afraid I won't work on this immediately because I've been busy for a paper deadline in a week, but I'll fix it after that.

Answer 2 · 2020-07-26T10:12:14.000Z

sure, good luck to your CoRL 👍

Answer 3 · 2020-07-29T15:14:07.000Z

Hi @keiohta , I assumed that's because we need to normalize the reward of GAIL/GAIfO, but I'm not sure. Will you have time these days to look into this issue? Thanks a lot!

Answer 4 · 2020-07-29T18:29:08.000Z

so the weird "nan" bug goes like this: I'm using tf2rl GAIfO for robosuite. during training, the discriminator output becomes nan suddenly. however, the input of the discriminator is normal. so using the nan output(the reward) to update the policy leads to a nan action, which leads to a mujoco error. When I use tf2rl GAIL/GAIfO for gym pendulum, the nan error sometimes happens, sometimes not. But for robosuite, it happens always. it cost me too much time and I have no idea how to fix it. I'm wondering if you can offer me some help. Thanks!

Answer 5 · 2020-07-29T20:52:22.000Z

Hi @haoyu-x , I'm going to look into this issue. In the mean time, can you try with enable_sn=True when creating the GAIL/GAIfO? One possible potential problem is that GAN like training is unstable, and Spectral Normalization certainly helps stabilize the training (see #14 that compares w/ w/o SN #14 (comment)).

Answer 6 · 2020-07-30T08:01:49.000Z

Thanks for your reply. I tried enable-sn, but there's another issue: Traceback (most recent call last): File "run_gaifo_robosuite.py", line 126, in <module> gpu=args.gpu) File "/home/haoyux/tf2rl-master/tf2rl/algos/gaifo.py", line 41, in __init__ units=units, enable_sn=enable_sn) File "/home/haoyux/tf2rl-master/tf2rl/algos/gaifo.py", line 26, in __init__ self([dummy_state, dummy_next_state]) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 968, in __call__ outputs = self.call(cast_inputs, *args, **kwargs) File "/home/haoyux/tf2rl-master/tf2rl/algos/gail.py", line 29, in call features = self.l1(features) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 968, in __call__ outputs = self.call(cast_inputs, *args, **kwargs) File "/home/haoyux/tf2rl-master/tf2rl/networks/spectral_norm_dense.py", line 52, in call rank = common_shapes.rank(inputs) AttributeError: module 'tensorflow.python.framework.common_shapes' has no attribute 'rank' and for the reward nan, I suppose it's because of the discriminator architecture.

…

On Thu, Jul 30, 2020 at 4:52 AM Kei Ohta ***@***.***> wrote: Hi @haoyu-x <https://github.com/haoyu-x> , I'm going to look into this issue. In the mean time, can you try with enable_sn=True when creating the GAIL/GAIfO? One possible potential problem is that GAN like training is unstable, and Spectral Normalization certainly helps stabilize the training (see #14 <#14> that compares w/ w/o SN #14 (comment) <#14 (comment)>). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#92 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APACPZTBUSQPLAOYTER55YDR6CDZNANCNFSM4PHR4ACA> .

Answer 7 · 2020-07-30T13:31:44.000Z

Thanks for trying to use SN. By the way, what’s your TensorFlow version? I tested on 2.0.0, and have not supported later version (but going to do soon).

Answer 8 · 2020-07-30T13:35:37.000Z

I see. My version is 2.2.0 Could you first make sn support 2.2.0. So that I can try if the nan error will disappear. Thanks. Kei Ohta <notifications@github.com>于2020年7月30日周四21:32写道：

…

Thanks for trying to use SN. By the way, what’s your TensorFlow version? I tested on 2.0.0, and have not supported later version (but going to do soon). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#92 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APACPZTI2QDGL3G67XFAVNLR6FY5DANCNFSM4PHR4ACA> .

Answer 9 · 2020-07-30T13:39:49.000Z

I'm also trying TensorFlow 2.0.0 Haoyu Xiong <haoyux@berkeley.edu>于2020年7月30日周四21:35写道：

…

I see. My version is 2.2.0 Could you first make sn support 2.2.0. So that I can try if the nan error will disappear. Thanks. Kei Ohta ***@***.***>于2020年7月30日周四21:32写道： > Thanks for trying to use SN. By the way, what’s your TensorFlow version? > I tested on 2.0.0, and have not supported later version (but going to do > soon). > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#92 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/APACPZTI2QDGL3G67XFAVNLR6FY5DANCNFSM4PHR4ACA> > . >

Answer 10 · 2020-07-30T13:51:13.000Z

thanks for your advice, so for now I'm able to run enable-sn with tf 2.0.0. but not sure if the nan error will happen again, will update later.

Answer 11 · 2020-07-31T05:17:05.000Z

Thanks @haoyu-x for trying tf2.0.0!! I'll do my best for supporting tf2.2.0, but might take time.
Anyway, please let me know if you have any kind of error. That definitely helps improve this library.

Answer 12 · 2020-07-31T05:19:00.000Z

Thank you！ I change to tf2.0.0 and enable sn. Now it goes well, the nan bug will not be thrown！ Kei Ohta <notifications@github.com>于2020年7月31日周五13:17写道：

…

Thanks @haoyu-x <https://github.com/haoyu-x> for trying tf2.0.0!! I'll do my best for supporting tf2.2.0, but might take time. Anyway, please let me know if you have any kind of error. That definitely helps improve this library. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#92 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APACPZS65CWEE5B7UPNX7KTR6JHV7ANCNFSM4PHR4ACA> .

Answer 13 · 2020-07-31T05:42:51.000Z

Glad to hear that!
Please remain this issue open till you can successfully run this on tf2.2 (or later). I'll work on that.

Answer 14 · 2020-08-08T12:52:58.000Z

Hi @haoyu-x , I think you can now run the original script on tf2.2 or later (PR #95 ).
Please confirm your script works without any problem, and close this issue.

Answer 15 · 2020-08-31T12:33:36.000Z

I think this issue has been resolved by the PR in my previous comment.
Also, you should be able to run on tf2.3 on latest master.

Please reopen this issue if you encounter the same problem.