Implement VAIL
Closed this issue · 5 comments
Experiment on Pendulum-v0
# Generate expert trajectories
$ python examples/run_sac.py --env-name=Pendulum-v0 --save-test-path --test-interval=100000 --max-steps 100000 --test-episodes=20 --gpu -1
# VAIL
$ python examples/run_vail_ddpg.py --env-name=Pendulum-v0 --test-interval=5000 --expert-path-dir results/20190706T231341.308287_SAC_ --gpu -1
# VAIL with Spectral Normalization
$ python examples/run_vail_ddpg.py --env-name=Pendulum-v0 --test-interval=5000 --expert-path-dir results/20190706T231341.308287_SAC_ --gpu -1 --enable-sn --dir-suffix SN
Results
- Score
- The maximum score is 0.
- VAIL is unstable. Adding Spectral Normalization stabilizes learning and improves score.
- DDPG loss
- VAIL
Experiment on HalfCheetah-v2
# Generate expert trajectories
$ python examples/run_sac.py --env-name=HalfCheetah-v2 --save-test-path --test-interval=500000 --max-steps 500000 --test-episodes=20 --gpu -1
# VAIL
$ python examples/run_vail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1
# VAIL_SN
$ python examples/run_vail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1 --enable-sn --dir-suffix _SN
# GAIL
$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1
# GAIL_SN
$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1 --enable-sn --dir-suffix _SN
Results
- All algorithm reproduced the expert score (~9000)
- Note: SAC can achieve around 12000, but cut training with 0.5M steps to save time.
- No big difference between GAIL, VAIL even utilizing Spectral Normalization
- Gray: VAIL, Orange: VAIL + SN, Red: GAIL, Blue: GAIL + SN, Green: Expert (SAC)
TensorBoard output
- Score
Hi @keiohta
I couldn't reproduce the same results, I got this error for HalfCheetah:
Traceback (most recent call last):
File "examples/run_vail_ddpg.py", line 44, in
trainer()
File "/home/ss/.local/lib/python3.8/site-packages/tf2rl/experiments/irl_trainer.py", line 51, in call
next_obs, reward, done, _ = self._env.step(action) ##
File "/home/ss/.local/lib/python3.8/site-packages/gym/wrappers/time_limit.py", line 16, in step
observation, reward, done, info = self.env.step(action)
File "/home/ss/.local/lib/python3.8/site-packages/gym/envs/mujoco/half_cheetah.py", line 12, in step
self.do_simulation(action, self.frame_skip)
File "/home/ss/.local/lib/python3.8/site-packages/gym/envs/mujoco/mujoco_env.py", line 125, in do_simulation
self.sim.step()
File "mujoco_py/mjsim.pyx", line 126, in mujoco_py.cymj.MjSim.step
File "mujoco_py/cymj.pyx", line 156, in mujoco_py.cymj.wrap_mujoco_warning.exit
File "mujoco_py/cymj.pyx", line 77, in mujoco_py.cymj.c_warning_callback
File "/home/ss/projects/mujoco-py/mujoco_py/builder.py", line 363, in user_warning_raise_exception
raise MujocoException(warn + 'Check for NaN in simulation.')
mujoco_py.builder.MujocoException: Unknown warning type Time = 24.7000.Check for NaN in simulation.
OS: Ubuntu 20
TF version: tested on 2.3 and 2.4
tf2rl: Master
Your input would be appreciated.