/imitation-learning

Imitation learning algorithms

Primary LanguagePythonMIT LicenseMIT

A Pragmatic Look at Deep Imitation Learning

MIT License

Imitation learning algorithms (with PPO [1]):

python main.py --imitation [AIRL|BC|DRIL|FAIRL|GAIL|GMMIL|PUGAIL|RED]

Options include:

  • State-only imitation learning: --state-only
  • Absorbing state indicator [12]: --absorbing
  • R1 gradient regularisation [13]: --r1-reg-coeff 1 (default)

Results

PPO

Train Test
ppo_train_returns ppo_test_returns

AIRL

Train Test
airl_train_returns airl_test_returns

BC

Train Test
bc_test_returns bc_test_returns

DRIL

Train Test
dril_train_returns dril_test_returns

FAIRL

Train Test
fairl_train_returns fairl_test_returns

GAIL

Train Test
gail_train_returns gail_test_returns

GMMIL

Train Test
gmmil_train_returns gmmil_test_returns

nn-PUGAIL

Train Test
pugail_train_returns pugail_test_returns

RED

Train Test
red_train_returns red_test_returns

Acknowledgements

Citation

If you find this work useful and would like to cite it, the following would be appropriate:

@misc{arulkumaran2020pragmatic,
  author = {Arulkumaran, Kai},
  title = {A Pragmatic Look at Deep Imitation Learning},
  url = {https://github.com/Kaixhin/imitation-learning},
  year = {2020}
}

References

[1] Proximal Policy Optimization Algorithms
[2] Adversarial Behavioral Cloning
[3] Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
[4] Efficient Training of Artificial Neural Networks for Autonomous Navigation
[5] Disagreement-Regularized Imitation Learning
[6] A Divergence Minimization Perspective on Imitation Learning Methods
[7] Generative Adversarial Imitation Learning
[8] Imitation Learning via Kernel Mean Embedding
[9] Positive-Unlabeled Reward Learning
[10] Primal Wasserstein Imitation Learning
[11] Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
[12] Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning
[13] Which Training Methods for GANs do actually Converge?