[WIP] Pytorch implementations of algorithms associated with reward learning, e.g., inverse reinforcement learning, preference-based reinforcement learning, etc.
Inverse Reinforcement Learning
- Generative Adversarial Imitation Learning (GAIL) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
- Learning Robust Rewards with Adversarial Inverse Reinforcement Learning (AIRL) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
- Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning (DAC) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
- Imitation Learning via Off-Policy Distribution Matching (ValueDice) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
- IQ-Learn: Inverse soft-Q Learning for Imitation (IQ-Learn) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
Preference-based Reinforcement Learning
- Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations(TREX) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
- Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations (DREX) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
- Preference Transformer: Modeling Human Preferences using Transformers for RL (PT) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
- Contrastive Preference Learning: Learning from Human Feedback without RL (CPL) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
- Inverse Preference Learning: Preference-based RL without a Reward Function (IPL) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
git clone https://github.com/BepfCp/Reward-pytorch.git
cd Reward-pytorch
pip install -e .
python example.py agent=irl/gail env.id=Hopper-v4 dataset=./data/demo/hopper_expert.h5
Thanks to the following tutorial, blog and open-source code:
- ...