Density in Mujoco Environments doesn't seem to change
jsalfity-hplabs opened this issue · 1 comments
Great work! We are trying to replicate your experiments.
Description of our setup - Ubuntu 16.04, installed rl-generalization and docker using the install instructions given in readme.
We came across an interesting bug that seems incorrect. We wanted to see the performance of HalfCheetah when only varying density so we ran python -m examples.run_experiments examples/test_density.yml /tmp/output
with the following yml file
models:
# PPO2 Baselines.
- name: PPO2
train:
command: |
python3 -m examples.ppo2_baselines.train
--env {environment}
--output {output}
--total-episodes {episodes}
--lr {lr}
--nsteps {nsteps}
--nminibatches {nminibatches}
--policy {policy}
output: 'checkpoints/*'
parameters: 'env-parameters-*.json'
evaluate:
command: |
python3 -m examples.ppo2_baselines.evaluate
--env {environment}
--outdir {output}
--eval-n-trials 1000
--eval-n-parallel 1
{model}
output: 'evaluation.json'
hyperparameters:
episodes: 1500000
policy: 'mlp'
lr: [0.0003]
nsteps: [256]
nminibatches: 1
#############################################################################
environments:
- train: SunblazeHalfCheetah-v0
test:
- SunblazeHalfCheetah-v0
- SunblazeHalfCheetahRandomExtreme-v0 #edited so only density is changing
and with SunblazeHalfCheetahRandomExtreme only changing the density to 1000000 in mujoco.py
as below:
class RandomExtremeHalfCheetah(RoboschoolXMLModifierMixin, ModifiableRoboschoolHalfCheetah): #edited to only change density
def randomize_env(self):
self.density = 1000000 #manually changed density value
with self.modify_xml('half_cheetah.xml') as tree:
for elem in tree.iterfind('worldbody/body/geom'):
elem.set('density', str(self.density))
def _reset(self, new=True):
if new:
self.randomize_env()
return super(RandomExtremeHalfCheetah, self)._reset(new)
@property
def parameters(self):
parameters = super(RandomExtremeHalfCheetah, self).parameters
parameters.update({'density': self.density})
return parameters
Looking at the json output of run_experiments
, the SunblazeHalfCheetah
model testing reward on both the SunblazeHalfCheetah
and SunblazeHalfCheetahRandomExtreme
(with density manually set to 1000000) are nearly the same. Last 2 rewards of both testing environments below:
"environment": {"id": "SunblazeHalfCheetah-v0"}, "reward": [26.933929443359375]}, {"success": false, "model": "examples/test_density_output/PPO2/SunblazeHalfCheetah-v0/checkpoints/182420", "environment": {"id": "SunblazeHalfCheetah-v0"}, "reward": [30.036670684814453]}, {"success": false, "model": "examples/test_density_output/PPO2/SunblazeHalfCheetah-v0/checkpoints/182420", "environment": {"id": "SunblazeHalfCheetah-v0"}, "reward": [25.795215606689453]}]}
"environment": {"id": "SunblazeHalfCheetahRandomExtreme-v0", "density": 1000000000}, "model": "examples/test_density_output/PPO2/SunblazeHalfCheetah-v0/checkpoints/182420"}, {"success": false, "reward": [40.01738739013672], "environment": {"id": "SunblazeHalfCheetahRandomExtreme-v0", "density": 1000000000}, "model": "examples/test_density_output/PPO2/SunblazeHalfCheetah-v0/checkpoints/182420"}, {"success": false, "reward": [26.907756805419922], "environment": {"id": "SunblazeHalfCheetahRandomExtreme-v0", "density": 1000000000}, "model": "examples/test_density_output/PPO2/SunblazeHalfCheetah-v0/checkpoints/182420"}]}
How can we confirm the density is changing? It doesn't seem logical that the Mujoco HalfCheetah simulation should be able to move at all given a density of 1000000 nor have similar testing rewards to the nominal environment.
I think environments can change when you add RoboschoolForwardWalkerMujocoXML.__init__(self, self.model_xml, 'torso', action_dim=6, obs_dim=26, power=0.9)
in randomize_env(self)
. I can be wrong because I'm using different version of roboschool.