/constrained-rl

Constrained Exploration and Recovery from Experience Shaping

Primary LanguagePythonMIT LicenseMIT

Constrained Exploration and Recovery from Experience Shaping

Constrained Exploration and Recovery from Experience Shaping is an algorithm for model-free reinforcement learning to actively reshape the action space of an agent during training so that reward-driven exploration is constrained within safety limits.

This repository accompanies the following paper on arXiv: https://arxiv.org/abs/1809.08925

Unconstrained Random Exploration Constrained Random Exploration

Installing

This implementation requires Python 3 and relies on Tensorflow for building and training constraint networks. Depending on your setup, run:

pip install tensorflow-gpu

if you have a CUDA-compatible device or:

pip install tensorflow

For training constraint networks together with control policies, we built on top of the OpenAI Baselines framework. Install it with:

pip install baselines

We will maintain compatibility with the OpenAI Baselines master branch (last confirmed check on 2018-09-08: commit), though feel free to create an issue if you notice something wrong.

Quadratic program solving is performed using quadprog. Install first Cython:

pip install Cython

Then:

pip install quadprog

Finally, clone this repository and install the local package with pip:

git clone git@github.com:IBM/constrained-rl.git
cd constrained-rl
pip install -e .

Examples

Examples and reference data are provided in the examples directory:

  1. Learning action space constraints from positive and negative demonstrations: fixed maze
  2. Learning action space constraints from scratch: random obstacles with position and force control

License

The Constrained Exploration and Recovery from Experience Shaping Project uses the MIT software license.

Contributing to the project

Full details of how to contribute to this project are documented in the CONTRIBUTING.md file.

Maintainers

The project's maintainers: are responsible for reviewing and merging all pull requests and they guide the over-all technical direction of the project.