pat-coady/trpo

KL, PolicyEntropy, PolicyLoss go to NaN after 31,455 episodes

David-Clement-Senbionic opened this issue · 6 comments

Hi there,
I have created a variant of the HumanStandup-v2 environment in gym which has a much simpler simulated robot that is represented as a mujoco formatted xml file. I have tested this model both in mujoco and in gym and it seems to work fine.
I tested the HumanStandup-v2 training on my hw/sw configuration and it worked well to 50,000 episodes. I then ran the identical setup with our robot model with the same reward function as the standard HumanStandup-v2. The only substantive difference between these two is the mujoco model.
When I ran the training on our model I get:

***** Episode 31455, Mean R = 28911.0 *****
Beta: 6.91
ExplainedVarNew: 0.913
ExplainedVarOld: 0.812
KL: nan
PolicyEntropy: nan
PolicyLoss: nan
Steps: 672
ValFuncLoss: 114

Traceback (most recent call last):
File "./train.py", line 334, in
main(**vars(args))
File "./train.py", line 290, in main
trajectories = run_policy(env, policy, scaler, logger, episodes=batch_size)
File "./train.py", line 135, in run_policy
observes, actions, rewards, unscaled_obs = run_episode(env, policy, scaler)
File "./train.py", line 105, in run_episode
obs, reward, done, _ = env.step(np.squeeze(action, axis=0))
File "/home/david/source/gym/gym/wrappers/monitor.py", line 31, in step
observation, reward, done, info = self.env.step(action)
File "/home/david/source/gym/gym/wrappers/time_limit.py", line 31, in step
observation, reward, done, info = self.env.step(action)
File "/home/david/source/gym/gym/envs/Senbionic/ballbotEnv.py", line 28, in step
self.do_simulation(a, self.frame_skip)
File "/home/david/source/gym/gym/envs/mujoco/mujoco_env.py", line 100, in do_simulation
self.sim.step()
File "source/mujoco-py/mujoco_py/mjsim.pyx", line 119, in mujoco_py.cymj.MjSim.step
File "source/mujoco-py/mujoco_py/cymj.pyx", line 115, in mujoco_py.cymj.wrap_mujoco_warning.exit
File "source/mujoco-py/mujoco_py/cymj.pyx", line 75, in mujoco_py.cymj.c_warning_callback
File "/home/david/.conda/envs/gym35/lib/python3.5/site-packages/mujoco_py-1.50.1.53-py3.5.egg/mujoco_py/builder.py", line 319, in user_warning_raise_exception
raise MujocoException('Got MuJoCo Warning: {}'.format(warn))
mujoco_py.builder.MujocoException: Got MuJoCo Warning: Unknown warning type Time = 0.0000.

I ran it again and it did the same thing at Episode 1280.

Any suggestions on how to approach overcoming this?

Many thanks for any advice..

It seemed to go away by reconfiguring the mujoco model parameters. I believe it was just mujoco hitting an "exploding" result causing a cascade effect.

It seemed to go away by reconfiguring the mujoco model parameters. I believe it was just mujoco hitting an "exploding" result causing a cascade effect.

@David-Clement-Senbionic Hi David, I also have the same problem, I create a new Mujoco humanoid model with human-like parameters but at some point my system is exploding like yours. How did you manage to reconfigure your mujoco model parameters? Also if it is possible could you share the code working? Did you change your reward function for standing up?

@David-Clement-Senbionic my mistake, the code is already shared :D but mujoco optimization is still an issue for me. Any help would be appreciated :)