sunblaze-ucb/rl-generalization

Density in Mujoco Environments doesn't seem to change

jsalfity-hplabs opened this issue · 1 comments

Great work! We are trying to replicate your experiments.

Description of our setup - Ubuntu 16.04, installed rl-generalization and docker using the install instructions given in readme.

We came across an interesting bug that seems incorrect. We wanted to see the performance of HalfCheetah when only varying density so we ran python -m examples.run_experiments examples/test_density.yml /tmp/output with the following yml file

models:
  # PPO2 Baselines.
  - name: PPO2
    train:
      command: |
        python3 -m examples.ppo2_baselines.train
          --env {environment}
          --output {output}
          --total-episodes {episodes}
          --lr {lr}
          --nsteps {nsteps}
          --nminibatches {nminibatches}
          --policy {policy}
      output: 'checkpoints/*'
      parameters: 'env-parameters-*.json'
    evaluate:
      command: |
        python3 -m examples.ppo2_baselines.evaluate
          --env {environment}
          --outdir {output}
          --eval-n-trials 1000
          --eval-n-parallel 1
          {model}
      output: 'evaluation.json'
    hyperparameters:
      episodes: 1500000
      policy: 'mlp'
      lr: [0.0003] 
      nsteps: [256]
      nminibatches: 1

 #############################################################################

environments:
  - train: SunblazeHalfCheetah-v0
    test:
      - SunblazeHalfCheetah-v0
      - SunblazeHalfCheetahRandomExtreme-v0 #edited so only density is changing

and with SunblazeHalfCheetahRandomExtreme only changing the density to 1000000 in mujoco.py as below:

class RandomExtremeHalfCheetah(RoboschoolXMLModifierMixin, ModifiableRoboschoolHalfCheetah): #edited to only change density 

    def randomize_env(self):
        self.density = 1000000 #manually changed density value

        with self.modify_xml('half_cheetah.xml') as tree:
            for elem in tree.iterfind('worldbody/body/geom'):
                elem.set('density', str(self.density))

    def _reset(self, new=True):
        if new:
            self.randomize_env()
        return super(RandomExtremeHalfCheetah, self)._reset(new)

    @property
    def parameters(self):
        parameters = super(RandomExtremeHalfCheetah, self).parameters
        parameters.update({'density': self.density})
        return parameters

Looking at the json output of run_experiments, the SunblazeHalfCheetah model testing reward on both the SunblazeHalfCheetah and SunblazeHalfCheetahRandomExtreme (with density manually set to 1000000) are nearly the same. Last 2 rewards of both testing environments below:

"environment": {"id": "SunblazeHalfCheetah-v0"}, "reward": [26.933929443359375]}, {"success": false, "model": "examples/test_density_output/PPO2/SunblazeHalfCheetah-v0/checkpoints/182420", "environment": {"id": "SunblazeHalfCheetah-v0"}, "reward": [30.036670684814453]}, {"success": false, "model": "examples/test_density_output/PPO2/SunblazeHalfCheetah-v0/checkpoints/182420", "environment": {"id": "SunblazeHalfCheetah-v0"}, "reward": [25.795215606689453]}]}
 "environment": {"id": "SunblazeHalfCheetahRandomExtreme-v0", "density": 1000000000}, "model": "examples/test_density_output/PPO2/SunblazeHalfCheetah-v0/checkpoints/182420"}, {"success": false, "reward": [40.01738739013672], "environment": {"id": "SunblazeHalfCheetahRandomExtreme-v0", "density": 1000000000}, "model": "examples/test_density_output/PPO2/SunblazeHalfCheetah-v0/checkpoints/182420"}, {"success": false, "reward": [26.907756805419922], "environment": {"id": "SunblazeHalfCheetahRandomExtreme-v0", "density": 1000000000}, "model": "examples/test_density_output/PPO2/SunblazeHalfCheetah-v0/checkpoints/182420"}]}

How can we confirm the density is changing? It doesn't seem logical that the Mujoco HalfCheetah simulation should be able to move at all given a density of 1000000 nor have similar testing rewards to the nominal environment.

I think environments can change when you add RoboschoolForwardWalkerMujocoXML.__init__(self, self.model_xml, 'torso', action_dim=6, obs_dim=26, power=0.9) in randomize_env(self). I can be wrong because I'm using different version of roboschool.