Codebase for my paper: Hierarchical Adversarial Inverse Reinforcement Learning
Language: Python
The following parts are included:
- Benchmarks built with Mujoco, including Hopper, Walker, Ant box-pushing, and Point maze.
- An implementation of the hierarchical imitation learning (HIL) algorithm proposed in our paper.
- Implementations of the SOTA IL and HIL algorithms as baselines, including GAIL, AIRL, Option-GAIL, Directed-Info GAIL.
The paper is available at:
Please cite this paper:
author = {Jiayu Chen and
Tian Lan and
Vaneet Aggarwal},
title = {Hierarchical Adversarial Inverse Reinforcement Learning},
journal = {CoRR},
volume = {abs/2210.01969},
year = {2022},
url = {},
doi = {10.48550/arXiv.2210.01969}
- on Ubuntu 18.04
- python 3.6
- pytorch 1.6
- tensorboard 2.5
- mujoco_py >= 1.5
- gym == 0.19.0
- matplotlib
- tqdm
- seaborn
- ...
You need to first enter the folder 'HierAIRL_Hopper'.
To run the code with specific algorithms:
# Option-GAIL:
python ./ --env_type mujoco --env_name Hopper-v2 --n_demo 1000 --device "cuda:0" --tag option-gail-1k --algo option_gail
python ./ --env_type mujoco --env_name Hopper-v2 --n_demo 1000 --device "cuda:0" --tag gail-1k --algo gail
python ./ --env_type mujoco --env_name Hopper-v2 --n_pretrain_epoch 50 --n_demo 1000 --device "cuda:0" --tag d_info_gail-1k --algo DI_gail
# Option-AIRL
python ./ --env_type mujoco --env_name Hopper-v2 --n_demo 1000 --device "cuda:0" --tag option-airl-1k --algo option_airl
python ./ --env_type mujoco --env_name Hopper-v2 --n_demo 1000 --device "cuda:0" --tag hier-airl-1k --algo hier_airl
python ./ --env_type mujoco --env_name Hopper-v2 --n_demo 1000 --device "cuda:0" --tag hier-gail-1k --algo hier_gail
To run the code with the random seed Y, for which we simply choose 0, 1, or 2, please add '--seed=Y' to the back. The same below for other tasks.
For the hyperparameters, please refer to 'HierAIRL_Hopper/'. The same below for other tasks.
You need to first enter the folder 'HierAIRL_Walker'.
To run the code with specific algorithms:
# Option-GAIL:
python ./ --env_type mujoco --env_name Walker2d-v2 --n_demo 5000 --device "cuda:0" --tag option-gail-5k --algo option_gail
python ./ --env_type mujoco --env_name Walker2d-v2 --n_demo 5000 --device "cuda:0" --tag gail-5k --algo gail
python ./ --env_type mujoco --env_name Walker2d-v2 --n_pretrain_epoch 50 --n_demo 5000 --device "cuda:0" --tag d_info_gail-5k --algo DI_gail
# Option-AIRL
python ./ --env_type mujoco --env_name Walker2d-v2 --n_demo 5000 --device "cuda:0" --tag option-airl-5k --algo option_airl
python ./ --env_type mujoco --env_name Walker2d-v2 --n_demo 5000 --device "cuda:0" --tag hier-airl-5k --algo hier_airl
python ./ --env_type mujoco --env_name Walker2d-v2 --n_demo 5000 --device "cuda:0" --tag hier-gail-5k --algo hier_gail
You need to first enter the folder 'HierAIRL_Ant'.
To run the code with specific algorithms:
# Option-GAIL:
python ./ --env_type mujoco --env_name AntPusher-v0 --n_demo 10000 --device "cuda:0" --tag option-gail-10k --algo option_gail
python ./ --env_type mujoco --env_name AntPusher-v0 --n_demo 10000 --device "cuda:0" --tag gail-10k --algo gail
python ./ --env_type mujoco --env_name AntPusher-v0 --n_pretrain_epoch 100 --n_demo 10000 --device "cuda:0" --tag d_info_gail-10k --algo DI_gail
# Option-AIRL:
python ./ --env_type mujoco --env_name AntPusher-v0 --n_demo 10000 --device "cuda:0" --tag option-airl-10k --algo option_airl
python ./ --env_type mujoco --env_name AntPusher-v0 --n_demo 10000 --device "cuda:0" --tag hier-airl-10k --algo hier_airl
python ./ --env_type mujoco --env_name AntPusher-v0 --n_demo 10000 --device "cuda:0" --tag hier-gail-10k --algo hier_gail
You need to first enter the folder 'HierAIRL_Point'.
To reproduce the results of expert trajectories, please run the following command, where XXX can be Point4Rooms-v1 or PointCorridor-v1. The results will be available in the folder 'result'.
python ./ --env_type mujoco --env_name XXX
- To reproduce the results of trajectories of the learned agents, please run the following command, where XXX can be Point4Rooms-v1 or PointCorridor-v1. The results will be available in the folder 'result'.
python ./ --env_type mujoco --env_name XXX
- To reproduce the learned agents with H-AIRL (i.e., the checkpoints), please run the following command, where XXX can be Point4Rooms-v1 or PointCorridor-v1. The results will be available in the folder 'result'.
python ./ --env_type mujoco --env_name XXX --n_demo 5000--device "cuda:0" --tag hier-airl-5k --algo hier_airl
You need to first enter the folder 'HierAIRL_Point_Room_transfer'.
To run the code with specific algorithms, please run the following commands, where X can be 0, 1, 2.
# Option-GAIL:
python ./ --env_type mujoco --env_name Point4Rooms-v1 --n_demo 5000 --device "cuda:0" --tag option-gail-5k --algo option_gail --seed X
python ./ --env_type mujoco --env_name Point4Rooms-v1 --n_demo 5000 --device "cuda:0" --tag gail-5k --algo gail --seed X
python ./ --env_type mujoco --env_name Point4Rooms-v1 --n_demo 5000 --device "cuda:0" --tag hier-airl-5k --algo hier_airl --seed X --init 0
# H-AIRL initialized with the checkpoint trained in another task:
python ./ --env_type mujoco --env_name Point4Rooms-v1 --n_demo 5000 --device "cuda:0" --tag hier-airl-5k --algo hier_airl --seed X --init 1
You need to first enter the folder 'HierAIRL_Point_Corridor_transfer'.
To run the code with specific algorithms, please run the following commands.
# Option-GAIL:
python ./ --env_type mujoco --env_name PointCorridor-v1 --n_demo 5000 --device "cuda:0" --tag option-gail-5k --algo option_gail --seed X
python ./ --env_type mujoco --env_name PointCorridor-v1 --n_demo 5000 --device "cuda:0" --tag gail-5k --algo gail --seed X
python ./ --env_type mujoco --env_name PointCorridor-v1 --n_demo 5000 --device "cuda:0" --tag hier-airl-5k --algo hier_airl --seed X --init 0
# H-AIRL initialized with the checkpoint trained in another task:
python ./ --env_type mujoco --env_name PointCorridor-v1 --n_demo 5000 --device "cuda:0" --tag hier-airl-5k --algo hier_airl --seed X --init 1