[AAAI 2023] Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning

This repository contains the PyTorch code for the paper "Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning" in AAAI 2023. [Paper][Appendix]


Experiments were run with Python 3.6 and these packages:

  • pytorch == 1.1.0
  • gym == 0.15.7
  • mujoco-py ==

Data Collection

We provide two different kinds of imperfect demonstrations data (i.e., D1 and D2) to evaluate the performance of UID. We firstly train an optimal policy $\pi_o$ by TRPO and $\pi_o$ is used to sample optimal demonstrations $D_o$. To collect imperfect demonstrations, 3 non-optimal demonstrators $\pi_n$ are used. $\pi_n$ in D1 is obtained by saving 3 checkpoints with increasing quality during the RL training. In D2, we add different Gaussian noise $\xi$ to the action distribution $a^\ast$ of $\pi_o$ to form non-optimal policy $\pi_n$. The action of $\pi_n$ is modeled as $a\sim\mathcal{N}(a^\ast, \xi^2)$ and we choose $\xi=[0.25, 0.4, 0.6]$ in these 3 non-optimal policies (i.e., $\pi_{n_3}$, $\pi_{n_2}$ and $\pi_{n_1}$).

The quality of each demonstrator is provided in the appendix.

Train UID

 python uid_main.py --env_id 1/2/3 --il_method uid/uidwail --c_data 1/2 --seed 0/1/2/3/4
 python uid_main.py --env_id 1/2/3 --il_method gail/irl/vail --c_data 1/2 --seed 0/1/2/3/4
 python uid_main.py --env_id 1/2/3 --il_method iwil/icgail --c_data 1/2 --seed 0/1/2/3/4

For other compared methods, the re-implementation of T-REX/D-REX can be found in trex_main.py.


For any questions, please feel free to contact me. (Email: yunke.wang@whu.edu.cn)


  title={Unlabeled imperfect demonstrations in adversarial imitation learning},
  author={Wang, Yunke and Du, Bo and Xu, Chang},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},


