Control Barrier Function-constrained Proximal Policy Optimization

This repository provides the framework used to conduct the experiments for our paper "Sampling-Based Safe Reinforcement Learning for Nonlinear Dynamical Systems", appearing in Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS), 2024. The paper is available here.

Specifically, this repo contains the following:

  1. Sampling based safety-constrained PPO
  2. Constrained beta policy
  3. Projection or Safety filter Benchmark

A Beta policy, given in, is constrained over the safe control set obtained from the cbf function defined in that represents Control Barrier Function (CBF) based safety constraints,and, this policy is then updated using proximal policy optimization defined in, which was adapted from Stable Baselines3.

In addition, we created a benchmark using projection-based or safety-filter based safe RL policies in using the CBFs defined in to obtain safety constraints. This essentially leads to a projection based safe RL policy like that proposed in Cheng et al., 2019.

Some of the dynamical components involved in our safe quadcopter gym environment are adapted from the repo:


  1. To install, first set up your preferred virtual environment, then do pip install -e .
  2. For Quadcopter experiments: Go to experiments directory and select the experiment (e.g., or that you wish to run
  3. For Pendulum Experiments: Go to Pendulum directory and run
  4. You'll see plots and rewards arrays being stored in the corresponding experiment folder