Imitation-Learning-Paper-Lists

Paper Collection for Imitation Learning in RL with brief introductions. This collection refers to Awesome-Imitation-Learning and also contains self-collected papers.

To be precise, the "imitation learning" is the general problem of learning from expert demonstration (LfD). There are 2 names derived from such a description, which are Imitation Learning and Apprenticeship Learning due to historical reasons. Usually, apprenticeship learning is mentioned in the context of "Apprenticeship learning via inverse reinforcement learning (IRL)" which recovers the reward function and learns policies from it, while imitation learning began with behavior cloning that learn the policy directly (ref and by Morgan-Kaufmann, NIPS 1989.). However, with the development of related researches, "imitation learning" is always used to represent the general LfD problem setting, which is also our view of point.

Typically, different settings of imitation learning derive to different specific areas. A general setting is that one can only obtain (1) pre-collected trajectories ((s,a) pairs) from uninteractive expert (2) he can interact with the environments (with simulators) (3) without reward signals. Here we list some of the other settings as below:

  1. No actions and only state / observations -> Imitation Learning From Observations (ILFO).

  2. With reward signals -> Imitation Learning with Rewards.

  3. Interactive expert for correctness and data aggregation -> On-policy Imitation Learning (begin as Dagger, Dataset Aggregation).

  4. Can not interact with Environments -> A special case of Batch RL (see a particular list in here, data in Batch RL can contain more than expert demos.)

What we want from imitation learning in different settings (for real world):

  1. Less interact with the real world environments with expert demonstrations to improve sample efficiency and learn good policies. (yet some works use few demonstrations to learn good policies but with a vast cost on interacting with environments)

  2. Real world actions are not available or hard to sample.

  3. Use expert data to improve sample efficiency and learn fast with good exploration ability.

  4. Some online setting that human are easily to join in, e.g., human correct the steering wheel in auto-driving cars.

  5. Learn good policies in real world where interact with the environment is difficult.

In this collection, we will concentrate on the general setting and we collect other settings in "Other Settings" section. For other settings, such as "Self-imitation learning" which imitates the policy from one's own historical data, we do not regard it as an imitation learning task.

These papers are classified mainly based on their methodology instead and their specific task settings (except single-agent/multi-agent settings) but since there are many cross-domain papers, the classification is just for reference. As you can see, many works focus on Robotics, especially papers of UCB.

Overview

Single-Agent

Reveiws&Tutorials

Behavior Cloning

Behavior Cloning (BC) directly replicating the expert’s behavior with supervised learning, which can be improved via data aggregation. One can say that BC is the simplest case of interactive direct policy learning.

One-shot / Zero-shot

Model based

Hierarchical RL

Multi-modal Behaviors

Learning with human preference

Inverse RL

Inverse Rinforcement Learning (IRL) learns hidden objectives of the expert’s behavior.

Reveiws&Tutorials

Papers

Beyesian Methods

Generative Adversarial Methods

Generative Adversarial Imitation Learning (GAIL) apply generative adversarial training manner into learning expert policies, which is derived from inverse RL.

Multi-modal Behaviors

Hierarchical RL

Task Transfer

Model-based

POMDP

Fixed Reward Methods

Recently, there is a paper designs a new idea for imitation learning, which learns a fixed reward signal which obviates the need for dynamic update of reward functions.

Goal-based methods

Beyesian Methods

Other Methods

Multi-Agent

MA Inverse RL

MA-GAIL

Other Settings

Imitation Learning from Observations

Review Papers

Regular Papers

Imitation Learning with rewards

On-policy Imitation Learning

Batch RL

see a particular list in here.

Applications