KL, PolicyEntropy, PolicyLoss go to NaN after 31,455 episodes

Question

KL, PolicyEntropy, PolicyLoss go to NaN after 31,455 episodes

David-Clement-Senbionic opened this issue 6 years ago · 6 comments

David-Clement-Senbionic commented 6 years ago

Hi there,
I have created a variant of the HumanStandup-v2 environment in gym which has a much simpler simulated robot that is represented as a mujoco formatted xml file. I have tested this model both in mujoco and in gym and it seems to work fine.
I tested the HumanStandup-v2 training on my hw/sw configuration and it worked well to 50,000 episodes. I then ran the identical setup with our robot model with the same reward function as the standard HumanStandup-v2. The only substantive difference between these two is the mujoco model.
When I ran the training on our model I get:

***** Episode 31455, Mean R = 28911.0 *****
Beta: 6.91
ExplainedVarNew: 0.913
ExplainedVarOld: 0.812
KL: nan
PolicyEntropy: nan
PolicyLoss: nan
Steps: 672
ValFuncLoss: 114

Traceback (most recent call last):
File "./train.py", line 334, in
main(**vars(args))
File "./train.py", line 290, in main
trajectories = run_policy(env, policy, scaler, logger, episodes=batch_size)
File "./train.py", line 135, in run_policy
observes, actions, rewards, unscaled_obs = run_episode(env, policy, scaler)
File "./train.py", line 105, in run_episode
obs, reward, done, _ = env.step(np.squeeze(action, axis=0))
File "/home/david/source/gym/gym/wrappers/monitor.py", line 31, in step
observation, reward, done, info = self.env.step(action)
File "/home/david/source/gym/gym/wrappers/time_limit.py", line 31, in step
observation, reward, done, info = self.env.step(action)
File "/home/david/source/gym/gym/envs/Senbionic/ballbotEnv.py", line 28, in step
self.do_simulation(a, self.frame_skip)
File "/home/david/source/gym/gym/envs/mujoco/mujoco_env.py", line 100, in do_simulation
self.sim.step()
File "source/mujoco-py/mujoco_py/mjsim.pyx", line 119, in mujoco_py.cymj.MjSim.step
File "source/mujoco-py/mujoco_py/cymj.pyx", line 115, in mujoco_py.cymj.wrap_mujoco_warning.exit
File "source/mujoco-py/mujoco_py/cymj.pyx", line 75, in mujoco_py.cymj.c_warning_callback
File "/home/david/.conda/envs/gym35/lib/python3.5/site-packages/mujoco_py-1.50.1.53-py3.5.egg/mujoco_py/builder.py", line 319, in user_warning_raise_exception
raise MujocoException('Got MuJoCo Warning: {}'.format(warn))
mujoco_py.builder.MujocoException: Got MuJoCo Warning: Unknown warning type Time = 0.0000.

I ran it again and it did the same thing at Episode 1280.

Any suggestions on how to approach overcoming this?

Many thanks for any advice..

Answer 1 · 2018-04-26T11:22:51.000Z

David, Without looking at this in more detail, my first suggestion would be to reduce learning rate on policy by 10x and see if it helps. Sorry I haven't been able to look more carefully. Pat

…

On Apr 18, 2018, at 9:37 PM, David Clement ***@***.***> wrote: Hi there, I have created a variant of the HumanStandup-v2 environment in gym which has a much simpler simulated robot that is represented as a mujoco formatted xml file. I have tested this model both in mujoco and in gym and it seems to work fine. I tested the HumanStandup-v2 training on my hw/sw configuration and it worked well to 50,000 episodes. I then ran the identical setup with the same reward function as the standard HumanStandup-v2. The only substantive difference between these two is the mujoco model. When I rand the training on our model I get: ***** Episode 31455, Mean R = 28911.0 ***** Beta: 6.91 ExplainedVarNew: 0.913 ExplainedVarOld: 0.812 KL: nan PolicyEntropy: nan PolicyLoss: nan Steps: 672 ValFuncLoss: 114 Traceback (most recent call last): File "./train.py", line 334, in main(**vars(args)) File "./train.py", line 290, in main trajectories = run_policy(env, policy, scaler, logger, episodes=batch_size) File "./train.py", line 135, in run_policy observes, actions, rewards, unscaled_obs = run_episode(env, policy, scaler) File "./train.py", line 105, in run_episode obs, reward, done, _ = env.step(np.squeeze(action, axis=0)) File "/home/david/source/gym/gym/wrappers/monitor.py", line 31, in step observation, reward, done, info = self.env.step(action) File "/home/david/source/gym/gym/wrappers/time_limit.py", line 31, in step observation, reward, done, info = self.env.step(action) File "/home/david/source/gym/gym/envs/Senbionic/ballbotEnv.py", line 28, in step self.do_simulation(a, self.frame_skip) File "/home/david/source/gym/gym/envs/mujoco/mujoco_env.py", line 100, in do_simulation self.sim.step() File "source/mujoco-py/mujoco_py/mjsim.pyx", line 119, in mujoco_py.cymj.MjSim.step File "source/mujoco-py/mujoco_py/cymj.pyx", line 115, in mujoco_py.cymj.wrap_mujoco_warning.exit File "source/mujoco-py/mujoco_py/cymj.pyx", line 75, in mujoco_py.cymj.c_warning_callback File "/home/david/.conda/envs/gym35/lib/python3.5/site-packages/mujoco_py-1.50.1.53-py3.5.egg/mujoco_py/builder.py", line 319, in user_warning_raise_exception raise MujocoException('Got MuJoCo Warning: {}'.format(warn)) mujoco_py.builder.MujocoException: Got MuJoCo Warning: Unknown warning type Time = 0.0000. Any suggestions on how to approach overcoming this? Many thanks for any advice.. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#21>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AWdFxIuERGX7q3jkIuJRYUy4CbZMUSKUks5tp-pfgaJpZM4TbBCP>.

Answer 2 · 2018-04-26T17:24:07.000Z

It seemed to go away by reconfiguring the mujoco model parameters. I believe it was just mujoco hitting an "exploding" result causing a cascade effect.

Answer 3 · 2018-04-27T11:18:32.000Z

Were you able to get your humanoid to stand up? If so, would love to see a video.

…

On Apr 26, 2018, at 1:24 PM, David Clement ***@***.***> wrote: Closed #21 <#21>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#21 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AWdFxOnlwqmtBmw8A3HUL-yGQoVhOg0Mks5tsgLCgaJpZM4TbBCP>.

Answer 4 · 2018-04-27T12:53:53.000Z

Hi Patrick, I only ran 50,000 episodes but it seemed to be working well. https://youtu.be/KEHqKpSNuJ0 Cool stuff 😎 David

…

Sent from my iPhone

On Apr 27, 2018, at 4:18 AM, Patrick Coady ***@***.***> wrote: Were you able to get your humanoid to stand up? If so, would love to see a video. > On Apr 26, 2018, at 1:24 PM, David Clement ***@***.***> wrote: > > Closed #21 <#21>. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub <#21 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AWdFxOnlwqmtBmw8A3HUL-yGQoVhOg0Mks5tsgLCgaJpZM4TbBCP>. > — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

Answer 5 · 2019-02-28T11:06:16.000Z

It seemed to go away by reconfiguring the mujoco model parameters. I believe it was just mujoco hitting an "exploding" result causing a cascade effect.

@David-Clement-Senbionic Hi David, I also have the same problem, I create a new Mujoco humanoid model with human-like parameters but at some point my system is exploding like yours. How did you manage to reconfigure your mujoco model parameters? Also if it is possible could you share the code working? Did you change your reward function for standing up?

Answer 6 · 2019-02-28T11:18:48.000Z

@David-Clement-Senbionic my mistake, the code is already shared :D but mujoco optimization is still an issue for me. Any help would be appreciated :)