/safe-control-gym

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

Primary LanguagePythonMIT LicenseMIT

safe-control-gym

Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-based control, and model-free and model-based reinforcement learning (RL).

These environments include (and evaluate) symbolic safety constraints and implement input, parameter, and dynamics disturbances to test the robustness and generalizability of control approaches. [PDF]

problem illustration

@article{brunke2021safe,
  title={Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning}, 
  author={Lukas Brunke and Melissa Greeff and Adam W. Hall and Zhaocong Yuan and Siqi Zhou and Jacopo Panerati and Angela P. Schoellig},
  journal = {Annual Review of Control, Robotics, and Autonomous Systems},
  year={2021},
  url = {https://arxiv.org/abs/2108.06266}}

Install on Ubuntu/macOS

(optional) Create and access a Python 3.7 environment using conda

$ conda create -n safe python=3.7                                  # Create environment (named 'safe' here)
$ conda activate safe                                              # Activate environment 'safe'

Clone and install the safe-control-gym repository

$ git clone -b ar https://github.com/utiasDSL/safe-control-gym.git # Clone repository (the 'ar' branch specifically)
$ cd safe-control-gym                                              # Enter the repository
$ pip install -e .                                                 # Install the repository

Architecture

Overview of safe-control-gym's API:

block diagram

Getting Started

Familiarize with APIs and environments with the scripts in examples/

$ cd ./examples/                                                   # Navigate to the examples folder
$ python3 tracking.py  --overrides tracking.yaml                   # PID trajectory tracking with the 2D quadcopter
$ python3 verbose_api.py --system cartpole --overrides verbose_api.yams  #  Printout of the extened safe-control-gym APIs

Systems Variables and 2D Quadrotor Lemniscate Trajectory Tracking

systems trajectory

Verbose API Example

List of Implemented Controllers

Re-create the Results in "Safe Learning in Robotics" [arXiv link]

Branch ar (or release v0.5.0) are the codebase for our review article on safe control and RL:

To stay in touch, get involved or ask questions, please contact us via e-mail ({jacopo.panerati, zhaocong.yuan, adam.hall, siqi.zhou, lukas.brunke, melissa.greeff}@robotics.utias.utoronto.ca) or through this form.

Figure 6—Robust GP-MPC [1]

$ cd ../experiments/figure6/                                       # Navigate to the experiment folder
$ chmod +x create_fig6.sh                                          # Make the script executable, if needed
$ ./create_fig6.sh                                                 # Run the script (ca. 2')

This will use the models in safe-control-gym/experiments/figure6/trained_gp_model/ to generate

gp-mpc

To also re-train the GP models from scratch (ca. 30' on a laptop)

$ chmod +x create_trained_gp_model.sh                              # Make the script executable, if needed
$ ./create_trained_gp_model.sh                                     # Run the script (ca. 30')

Note: this will backup and overwrite safe-control-gym/experiments/figure6/trained_gp_model/


Figure 7—Safe RL Exploration [2]

$ cd ../figure7/                                                   # Navigate to the experiment folder
$ chmod +x create_fig7.sh                                          # Make the script executable, if needed
$ ./create_fig7.sh                                                 # Run the script (ca. 5'')

This will use the data in safe-control-gym/experiments/figure7/safe_exp_results.zip/ to generate

safe-exp

To also re-train all the controllers/agents (warning: >24hrs on a laptop, if necessary, run each one of the loops in the Bash script—PPO, PPO with reward shaping, and the Safe Explorer—separately)

$ chmod +x create_safe_exp_results.sh                              # Make the script executable, if needed
$ ./create_safe_exp_results.sh                                     # Run the script (>24hrs)

Note: this script will (over)write the results in safe-control-gym/experiments/figure7/safe_exp_results/; if you do not run the re-training to completion, delete the partial results rm -r -f ./safe_exp_results/ before running ./create_fig7.sh again.


Figure 8—Model Predictive Safety Certification [3]

(required) Obtain MOSEK's license (free for academia). Once you have received (via e-mail) and downloaded the license to your own ~/Downloads folder, install it by executing

$ mkdir ~/mosek                                                    # Create MOSEK license folder in your home '~'
$ mv ~/Downloads/mosek.lic ~/mosek/                                # Copy the downloaded MOSEK license to '~/mosek/'

Then run

$ cd ../figure8/                                                   # Navigate to the experiment folder
$ chmod +x create_fig8.sh                                          # Make the script executable, if needed
$ ./create_fig8.sh                                                 # Run the script (ca. 1')

This will use the unsafe (pre-trained) PPO controller/agent in folder safe-control-gym/experiments/figure8/unsafe_ppo_model/ to generate

mpsc-1

mpsc-2 mpsc-3

To also re-train the unsafe PPO controller/agent (ca. 2' on a laptop)

$ chmod +x create_unsafe_ppo_model.sh                              # Make the script executable, if needed
$ ./create_unsafe_ppo_model.sh                                     # Run the script (ca. 2')

Note: this script will (over)write the model in safe-control-gym/experiments/figure8/unsafe_ppo_model/

References

Related Open-source Projects


University of Toronto's Dynamic Systems Lab / Vector Institute for Artificial Intelligence