Implement VAIL

Question

Implement VAIL

Closed this issue 6 years ago · 5 comments

Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow

keiohta commented 6 years ago

4535cfb

Answer 1 · 2019-07-07T00:58:29.000Z

Experiment on Pendulum-v0

# Generate expert trajectories
$ python examples/run_sac.py --env-name=Pendulum-v0 --save-test-path --test-interval=100000 --max-steps 100000 --test-episodes=20 --gpu -1

# VAIL
$ python examples/run_vail_ddpg.py --env-name=Pendulum-v0 --test-interval=5000 --expert-path-dir results/20190706T231341.308287_SAC_ --gpu -1
# VAIL with Spectral Normalization
$ python examples/run_vail_ddpg.py --env-name=Pendulum-v0 --test-interval=5000 --expert-path-dir results/20190706T231341.308287_SAC_ --gpu -1 --enable-sn --dir-suffix SN

Results

Score
- The maximum score is 0.
- VAIL is unstable. Adding Spectral Normalization stabilizes learning and improves score.

DDPG loss

VAIL

Answer 2 · 2019-07-07T22:13:43.000Z

Experiment on HalfCheetah-v2

# Generate expert trajectories
$ python examples/run_sac.py --env-name=HalfCheetah-v2 --save-test-path --test-interval=500000 --max-steps 500000 --test-episodes=20 --gpu -1

# VAIL
$ python examples/run_vail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1
# VAIL_SN
$ python examples/run_vail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1 --enable-sn --dir-suffix _SN

# GAIL
$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1
# GAIL_SN
$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1 --enable-sn --dir-suffix _SN

Results

All algorithm reproduced the expert score (~9000)
- Note: SAC can achieve around 12000, but cut training with 0.5M steps to save time.
No big difference between GAIL, VAIL even utilizing Spectral Normalization
- Gray: VAIL, Orange: VAIL + SN, Red: GAIL, Blue: GAIL + SN, Green: Expert (SAC)

TensorBoard output

Score

- VAIL

- GAIL

- DDPG loss

Answer 3 · 2021-07-25T14:02:35.000Z

Hi @keiohta
I couldn't reproduce the same results, I got this error for HalfCheetah:
Traceback (most recent call last):
File "examples/run_vail_ddpg.py", line 44, in
trainer()
File "/home/ss/.local/lib/python3.8/site-packages/tf2rl/experiments/irl_trainer.py", line 51, in call
next_obs, reward, done, _ = self._env.step(action) ##
File "/home/ss/.local/lib/python3.8/site-packages/gym/wrappers/time_limit.py", line 16, in step
observation, reward, done, info = self.env.step(action)
File "/home/ss/.local/lib/python3.8/site-packages/gym/envs/mujoco/half_cheetah.py", line 12, in step
self.do_simulation(action, self.frame_skip)
File "/home/ss/.local/lib/python3.8/site-packages/gym/envs/mujoco/mujoco_env.py", line 125, in do_simulation
self.sim.step()
File "mujoco_py/mjsim.pyx", line 126, in mujoco_py.cymj.MjSim.step
File "mujoco_py/cymj.pyx", line 156, in mujoco_py.cymj.wrap_mujoco_warning.exit
File "mujoco_py/cymj.pyx", line 77, in mujoco_py.cymj.c_warning_callback
File "/home/ss/projects/mujoco-py/mujoco_py/builder.py", line 363, in user_warning_raise_exception
raise MujocoException(warn + 'Check for NaN in simulation.')
mujoco_py.builder.MujocoException: Unknown warning type Time = 24.7000.Check for NaN in simulation.
OS: Ubuntu 20
TF version: tested on 2.3 and 2.4
tf2rl: Master
Your input would be appreciated.

Answer 4 · 2021-07-25T14:06:01.000Z

Hi @sasayesh , thanks for reporting the error. I'll try to reproduce the bug on Wednesday.