/SCPO

SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization

Primary LanguagePythonMIT LicenseMIT

Safty Critic Policy Interation (SCPO)

SCPO is a safe reinforcement learning algorithm. This repo is a fork of Stable Baselines3.

Installation

Note: Stable-Baselines3 supports PyTorch >= 1.11

Prerequisites

SCPO requires Python 3.7+.

Install using pip

Install the Stable Baselines3 package:

pip install -r requirements.txt

We use environments from Bullet-Safety-Gym. Please follow the installation steps from https://github.com/SvenGronauer/Bullet-Safety-Gym.

If you want to run pytorch in gpu mode, please install cuda and pytorch separately https://pytorch.org/

Training

Example code for training can be found at train.py. To train models with the best hyperparameters, please check train_best_hyper.py.

Running the environment

Check play.py.

Alt Text Alt Text
Alt Text Alt Text

Benchmark

Drag Racing    Drag Racing    Drag Racing    Drag Racing    Drag Racing   

Citation

If you find the repository useful, please cite the study

@article{mhamed2023scpo,
  title={SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization},
  author={Mhamed, Jaafar and Gu, Shangding},
  journal={arXiv preprint arXiv:2311.00880},
  year={2023}
}