Reward-pytorch

[WIP] Pytorch implementations of algorithms associated with reward learning, e.g., inverse reinforcement learning, preference-based reinforcement learning, etc.

Implemented Algorithms

Inverse Reinforcement Learning

Generative Adversarial Imitation Learning (GAIL) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
Learning Robust Rewards with Adversarial Inverse Reinforcement Learning (AIRL) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning (DAC) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
Imitation Learning via Off-Policy Distribution Matching (ValueDice) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
IQ-Learn: Inverse soft-Q Learning for Imitation (IQ-Learn) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]

Preference-based Reinforcement Learning

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations(TREX) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations (DREX) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
Preference Transformer: Modeling Human Preferences using Transformers for RL (PT) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
Contrastive Preference Learning: Learning from Human Feedback without RL (CPL) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]
Inverse Preference Learning: Preference-based RL without a Reward Function (IPL) [arXiv] [annotated PDF] [official code] [reproduce-env.yaml]

Installation

git clone https://github.com/BepfCp/Reward-pytorch.git
cd Reward-pytorch
pip install -e .

Run Experiment

python example.py agent=irl/gail env.id=Hopper-v4 dataset=./data/demo/hopper_expert.h5

Acknowledgement

Thanks to the following tutorial, blog and open-source code: