pat-coady/trpo

Trouble using pybullet and roboschool envs

llecam opened this issue · 11 comments

Hello Pat,

Thanks for your great repo !

I read some of the closed issues but didn't find the same I'm encountering.

I tried to launch your train.py code on RoboschoolInvertedPendulum-v1 which is supposed to be the same as the MuJoCo environment but I got the following error:

File "/home/lea/roboschool/roboschool/gym_pendulums.py", line 88, in calc_state assert( np.isfinite(x) ) AssertionError

The error occurs on line 108: obs, reward, done, _ = env.step(np.squeeze(action, axis=0)). This should be an error with the dimensions but I can't figure out what is exactly the problem.

I tried to workaround by using the corresponding pybullet environment InvertedPendulumBullet-v0. The program is running but the pendulum doesn't seem to learn a thing. The reward stays on average around 30.0 and doesn't make any progress. Do you have an idea why ?

I'm looking forward for your reply !

Léa

@llecam -- does my solution in this issue help?

Hi,

@pender Thanks but this issue didn't really help I don't exactly have the same problem
@pat-coady I tried both master branch and aygym-evaluation branch and I got the same results. For the pybullet environment it doesn't seem to learn.

Thanks, Léa

Hi,

I had a very similar issue, so I finally downloaded Mujoco to try understand where the problem comes from. I have the same issue with mujoco Inverted Pendulum as Lea has with pybullet. Everything seem to be working fine but the reward stays around 10 even after 5000 iterations. I tried on both branches. You will find my log folder attached.
I am using Python2.7, do you think it could be a problem ?

Jan-10_09:29:48.zip

Thanks, Louis

Sorry I haven't had time to look at this yet.

Can you confirm it is a continuous control environment? In other words, the environment is expecting a vector of real numbers (and not ints or bools).

Some of the gym environments are not continuous control.

@llecam - can you post code the reproduces problem using roboschool and inverted pendulum. I'll do my best to run tonight or tomorrow.

Thank you for your time. I only used continuous environments.

The only modifications I brought to your code is adding different "import" in train.py to make it work with pybullet and/or roboschool:
import pybullet as p import pybullet_envs import roboschool

With roboschool version of inverted pendulum environment "RoboschoolInvertedPendulum-v1", I got the following when I launch train.py with default parameters:
Traceback (most recent call last): File "train.py", line 334, in <module> main(**vars(args)) File "train.py", line 287, in main run_policy(env, policy, scaler, logger, episodes=5) File "train.py", line 135, in run_policy observes, actions, rewards, unscaled_obs = run_episode(env, policy, scaler) File "train.py", line 105, in run_episode obs, reward, done, _ = env.step(np.squeeze(action, axis=0)) File "/home/lea/gym/gym/core.py", line 96, in step return self._step(action) File "/home/lea/gym/gym/wrappers/monitoring.py", line 32, in _step observation, reward, done, info = self.env.step(action) File "/home/lea/gym/gym/core.py", line 96, in step return self._step(action) File "/home/lea/gym/gym/wrappers/time_limit.py", line 36, in _step observation, reward, done, info = self.env.step(action) File "/home/lea/gym/gym/core.py", line 96, in step return self._step(action) File "/home/lea/roboschool/roboschool/gym_pendulums.py", line 97, in _step state = self.calc_state() # sets self.pos_x self.pos_y File "/home/lea/roboschool/roboschool/gym_pendulums.py", line 88, in calc_state assert( np.isfinite(x) ) AssertionError

With pybullet version of inverted pendulum "InvertedPendulumBulletEnv-v0", I don't get any error but I feel like it starts again at each epoch without learning. I got the following results in my terminal with parameters by default:
`***** Episode 20, Mean R = 28.1 *****
Beta: 0.667
ExplainedVarNew: 1.11e-16
ExplainedVarOld: 0
KL: 3.12e-05
PolicyEntropy: 0.923
PolicyLoss: -0.000577
Steps: 562
ValFuncLoss: 0.0041

[2018-01-15 15:26:40,022] Starting new video recorder writing to /tmp/InvertedPendulumBulletEnv-v0/Jan-15_14:26:37/openaigym.video.0.3580.video000027.mp4
***** Episode 40, Mean R = 24.9 *****
Beta: 0.444
ExplainedVarNew: 0
ExplainedVarOld: 0
KL: 5.36e-07
PolicyEntropy: 0.922
PolicyLoss: -6.3e-05
Steps: 497
ValFuncLoss: 0.0042

[2018-01-15 15:26:40,967] Starting new video recorder writing to /tmp/InvertedPendulumBulletEnv-v0/Jan-15_14:26:37/openaigym.video.0.3580.video000064.mp4
***** Episode 60, Mean R = 22.9 *****
Beta: 0.296
ExplainedVarNew: 0
ExplainedVarOld: 0
KL: 1.16e-05
PolicyEntropy: 0.92
PolicyLoss: -0.000143
Steps: 458
ValFuncLoss: 0.00167

***** Episode 80, Mean R = 20.2 *****
Beta: 0.198
ExplainedVarNew: 0
ExplainedVarOld: 0
KL: 8.88e-06
PolicyEntropy: 0.92
PolicyLoss: -8.1e-05
Steps: 405
ValFuncLoss: 0.00259

***** Episode 100, Mean R = 24.4 *****
Beta: 0.132
ExplainedVarNew: 0
ExplainedVarOld: 0
KL: 1.17e-05
PolicyEntropy: 0.919
PolicyLoss: -0.000282
Steps: 487
ValFuncLoss: 0.00278

***** Episode 120, Mean R = 25.1 *****
Beta: 0.0878
ExplainedVarNew: 1.11e-16
ExplainedVarOld: 1.11e-16
KL: 8.52e-06
PolicyEntropy: 0.917
PolicyLoss: -0.000111
Steps: 503
ValFuncLoss: 0.00175

[2018-01-15 15:26:42,787] Starting new video recorder writing to /tmp/InvertedPendulumBulletEnv-v0/Jan-15_14:26:37/openaigym.video.0.3580.video000125.mp4
***** Episode 140, Mean R = 25.8 *****
Beta: 0.0585
ExplainedVarNew: 0
ExplainedVarOld: 0
KL: 2.54e-05
PolicyEntropy: 0.915
PolicyLoss: -0.000493
Steps: 516
ValFuncLoss: 0.00333

***** Episode 160, Mean R = 26.2 *****
Beta: 0.039
ExplainedVarNew: 0
ExplainedVarOld: 0
KL: 1.37e-06
PolicyEntropy: 0.914
PolicyLoss: 3.45e-05
Steps: 525
ValFuncLoss: 0.00406

***** Episode 180, Mean R = 26.4 *****
Beta: 0.026
ExplainedVarNew: 0
ExplainedVarOld: 0
KL: 2.13e-05
PolicyEntropy: 0.916
PolicyLoss: -0.000416
Steps: 528
ValFuncLoss: 0.00318

***** Episode 200, Mean R = 31.1 *****
Beta: 0.0173
ExplainedVarNew: 0
ExplainedVarOld: 0
KL: 8.34e-07
PolicyEntropy: 0.916
PolicyLoss: 1.93e-05
Steps: 621
ValFuncLoss: 0.00797
`

I only made a copy of the first 200 episodes because the results are similar afterwards as well.

I'm running in on Python 2.7. Could it be a problem ?

I had indeed a problem because I was using Python 2.7. Thank you ! It's working with Python 3.