/ICL

Official Code for ICLR 2023 paper "Learning Soft Constraints from Constrained Expert Demonstrations"

Primary LanguagePython

Inverse Constraint Learning

Paper: Learning Soft Constraints From Constrained Expert Demonstrations, Gaurav et al. (2023)

This repository contains the code for ICL paper. After you run any command, the results will be logged to tensorboard.

How does it work?

Constrained RL takes in reward and constraint(s) and produces an optimal constrained policy.

The inverse problem, i.e. Inverse Constrained RL takes in a dataset of trajectories sampled using an optimal expert and produces a set of reward and constraint(s) such that they produce the expert policy when CRL is performed with them.

Due to unidentifiability, Inverse Constrained RL is a difficult problem. Hence, we solve a simplified problem - i.e. we assume the reward is known and that we only need to learn a single constraint.

The idea is inspired from the IRL template, which alternates between policy optimization and reward adjustment. In our case, we alternate between constrained policy optimization and constraint function adjustment.

For further details regarding the optimization and algorithm, please see the paper.

We conduct several experiments across synthetic environments, robotics environments and real world highway environments. The steps to run these experiments are detailed further in this README.

Setup

  • Install OpenMPI and Mujoco 2.1.0
  • Update tools/__init__.py constants to have the correct directories for ExiD dataset.
  • Install tools package by running pip install . in the root directory.

High level workflow

  • If you face any OpenGL error, install Xvfb and prefix the command with xvfb-run -a.
  • For the rest of the commands, replace:
    • SEED=1/2/3/4/5/anything
    • BETA=anything (if BETA=-1 then the default, defined in the config file, is used)
    • ENV is defined depending on the environments:
      • Gridworld (A): ENV=gridworldA
      • Gridworld (B): ENV=gridworldB
      • CartPole (MR): ENV=cartpoleMR
      • CartPole (Mid): ENV=cartpoleM
      • HighD: ENV=highdgap
      • Ant-Constrained: ENV=ant
      • HalfCheetah-Constrained: ENV=hc
      • ExiD: ENV=exid
  • Expert data (either generate OR use saved data):
    • Use saved data: cp expert-data/data-ENV.pt data.pt
    • Generate for HighD environment: python3 -B expert_highD.py
    • Generate for ExiD environment: python3 -B expert_exiD.py (this uses data in tools/assets/exiD, already provided, which was generated using prepare_exid_data.py)
    • Generate for other environments: python3 -B expert.py -c configs/ENV.json
  • Run methods
    • ICL: python3 -B icl.py -c configs/ENV.json -seed SEED -beta BETA
    • GAIL-Constraint: python3 -B gail_constraint.py -c configs/gail-ENV.yaml -seed SEED
    • ICRL: python3 -B icrl.py -c configs/icrl-ENV.yaml -seed SEED

Credits

Please check the individual repositories for licenses.