/safety_rl

Primary LanguagePythonOtherNOASSERTION

Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning

License Python 3.8

This repository implements a model-free reach-avoid reinforcement learning (RARL) to guarantee safety and liveness, and additionally contains example uses and benchmark evaluations of the proposed algorithm on a range of nonlinear systems. RARL is primarily developed by Kai-Chieh Hsu, a PhD student in the Safe Robotics Lab, and Vicenç Rubies-Royo, a postdoc in the Hybrid Systems Lab.

The repository also serves as the companion code to our RSS 2021 paper, where you can find the theoretical properties of the proposed algorithm as well as the implementation details. All experiments in the paper are included as examples in this repository, and you can replicate the results by using the commands described in Section II below. With some simple modification, you can replicate the results in the preceding ICRA 19 paper, which considers the special case of reachability/safety only.

This tool is designed to work for arbitrary reinforcement learning environments, and uses two scalar signals (a target margin and a safety margin) rather than a single scalar reward signal. You just need to add your environment under gym_reachability and register through the standard method in gym. You can refer to some examples provided here. This tool learns the reach-avoid set by trial-and-error interactions with the environment, so it is not in itself a safe learning algorithm. However, it can be used in conjunction with an existing safe learning scheme, such as "shielding", to enable learning with safety guarantees (see Script 4 below as well as Section IV.B in the RSS 2021 paper for an example).

The implementation of tabular Q-learning is adapted from Denny Britz's implementation and the implementation of double deep Q-network and replay memory is adapted from PyTorch's tutorial (by Adam Paszke).

I. Dependencies

If you are using anaconda to control packages, you can use one of the following command to create an identical environment with the specification file:

conda create --name <myenv> --file doc/spec-mac.txt
conda create --name <myenv> --file doc/spec-linux.txt

Otherwise, you can install the following packages manually:

  1. numpy=1.21.1
  2. pytorch=1.9.0
  3. gym=0.18.0
  4. scipy=1.7.0
  5. matplotlib=3.4.2
  6. box2d-py=2.3.8
  7. shapely=1.7.1

II. Replicating the results in the RSS 2021 paper

Each script will automatically generate a folder under experiments/ containing visualizations of the the training process and the weights of trained model. In addition, the script will generate a train.pkl file, which contains the following:

  • training loss
  • training accuracy
  • trajectory rollout outcome starting from a grid of states
  • action taken from a grid of states
  1. Lunar lander in Figure 1
    python3 sim_lunar_lander.py -sf
  1. Point object in Figure 2
    python3 sim_naive.py -w -sf -a -g 0.9 -mu 12000000 -cp 600000 -ut 20 -n anneal
  1. Point object in Figure 3
    python3 sim_naive.py -sf -g 0.9999 -n 9999
  1. Point object in Figure 4
    python3 sim_show.py -sf -g 0.9999 -n 9999
  1. Dubins car in Figure 5
    python3 sim_car_one.py -sf -w -wi 5000 -g 0.9999 -n 9999
  1. Dubins car (attack-defense game) in Figure 7 (Section IV.D):
    python3 sim_car_pe.py -sf -w -wi 30000 -g 0.9999 -n 9999

Paper Citation

If you use this code or find it helpful, please consider citing the companion RSS 2021 paper as:

@INPROCEEDINGS{hsu2021safety,
    AUTHOR    = {Kai-Chieh Hsu$^*$ and Vicenç Rubies-Royo$^*$ and Claire J. Tomlin and Jaime F. Fisac},
    TITLE     = {Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning},
    BOOKTITLE = {Proceedings of Robotics: Science and Systems},
    YEAR      = {2021},
    ADDRESS   = {Virtual},
    MONTH     = {July},
    DOI       = {10.15607/RSS.2021.XVII.077}
}