keiohta/tf2rl

Implement VAIL

Closed this issue · 5 comments

Experiment on Pendulum-v0

# Generate expert trajectories
$ python examples/run_sac.py --env-name=Pendulum-v0 --save-test-path --test-interval=100000 --max-steps 100000 --test-episodes=20 --gpu -1

# VAIL
$ python examples/run_vail_ddpg.py --env-name=Pendulum-v0 --test-interval=5000 --expert-path-dir results/20190706T231341.308287_SAC_ --gpu -1
# VAIL with Spectral Normalization
$ python examples/run_vail_ddpg.py --env-name=Pendulum-v0 --test-interval=5000 --expert-path-dir results/20190706T231341.308287_SAC_ --gpu -1 --enable-sn --dir-suffix SN

Results

  • Score
    • The maximum score is 0.
    • VAIL is unstable. Adding Spectral Normalization stabilizes learning and improves score.

190707_VAIL_Pendulum_score

  • DDPG loss

190707_VAIL_Pendulum_DDPG_loss

  • VAIL

190707_VAIL_Pendulum_info

Experiment on HalfCheetah-v2

# Generate expert trajectories
$ python examples/run_sac.py --env-name=HalfCheetah-v2 --save-test-path --test-interval=500000 --max-steps 500000 --test-episodes=20 --gpu -1

# VAIL
$ python examples/run_vail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1
# VAIL_SN
$ python examples/run_vail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1 --enable-sn --dir-suffix _SN

# GAIL
$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1
# GAIL_SN
$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1 --enable-sn --dir-suffix _SN

Results

  • All algorithm reproduced the expert score (~9000)
    • Note: SAC can achieve around 12000, but cut training with 0.5M steps to save time.
  • No big difference between GAIL, VAIL even utilizing Spectral Normalization
    • Gray: VAIL, Orange: VAIL + SN, Red: GAIL, Blue: GAIL + SN, Green: Expert (SAC)

TensorBoard output

  • Score

190707_VAIL_GAIL_HalfCheetah_score

- VAIL

190707_VAIL_HalfCheetah

- GAIL

190707_GAIL_HalfCheetah

- DDPG loss

190707_VAIL_GAIL_HalfCheetah_DDPG_loss

Hi @keiohta
I couldn't reproduce the same results, I got this error for HalfCheetah:
Traceback (most recent call last):
File "examples/run_vail_ddpg.py", line 44, in
trainer()
File "/home/ss/.local/lib/python3.8/site-packages/tf2rl/experiments/irl_trainer.py", line 51, in call
next_obs, reward, done, _ = self._env.step(action) ##
File "/home/ss/.local/lib/python3.8/site-packages/gym/wrappers/time_limit.py", line 16, in step
observation, reward, done, info = self.env.step(action)
File "/home/ss/.local/lib/python3.8/site-packages/gym/envs/mujoco/half_cheetah.py", line 12, in step
self.do_simulation(action, self.frame_skip)
File "/home/ss/.local/lib/python3.8/site-packages/gym/envs/mujoco/mujoco_env.py", line 125, in do_simulation
self.sim.step()
File "mujoco_py/mjsim.pyx", line 126, in mujoco_py.cymj.MjSim.step
File "mujoco_py/cymj.pyx", line 156, in mujoco_py.cymj.wrap_mujoco_warning.exit
File "mujoco_py/cymj.pyx", line 77, in mujoco_py.cymj.c_warning_callback
File "/home/ss/projects/mujoco-py/mujoco_py/builder.py", line 363, in user_warning_raise_exception
raise MujocoException(warn + 'Check for NaN in simulation.')
mujoco_py.builder.MujocoException: Unknown warning type Time = 24.7000.Check for NaN in simulation.
OS: Ubuntu 20
TF version: tested on 2.3 and 2.4
tf2rl: Master
Your input would be appreciated.

Hi @sasayesh , thanks for reporting the error. I'll try to reproduce the bug on Wednesday.