
Code for L4DC 2022 paper: Joint Synthesis of Safety Certificate and Safe Control Policy Using Constrained Reinforcement Learning.

Primary LanguagePython

Joint Synthesis of Safety Certificate and Safe Control Policy Using Constrained Reinforcement Learning

This repository is the official implementation of Joint Synthesis of Safety Certificate and Safe Control Policy Using Constrained Reinforcement Learning. The code base of this implementation is the Parallel Asynchronous Buffer-Actor-Learner (PABAL) architecture, which includes implementations of most common RL algorithms with the state-of-the-art training efficiency. If you are interested in or want to contribute to PABAL, you can contact me or the original creator. I also reimplemented it with ppo on TF1 after the paper was submitted. TF1 code is directly modified from safety-gym open-sourced code, which is easier to setup and run. The results of two versions only have differences in the early training stage, which does not affect the claimed performance.


First, install Safety-gym. Then you might replace the engine.py in safety-gym package with our custom engine. We modify the engine.py to estimate the distance and velocity, so that the code could be more general to any other tasks with the distance and velocity observations/signals.

To install other requirements:

$ pip install -U ray
$ pip install tensorflow==2.5.0
$ pip install tensorflow_probability==0.13.0
$ pip install seaborn matplotlib


To train the algorithm(s) in the paper, run these commands:

$ export PYTHONPATH=/your/path/to/Reachability_Constrained_RL/:$PYTHONPATH
$ cd ./train_scripts/
$ python train_scripts4fsac.py                # FAC-SIS / FAC-\phi_0,\phi_h (changing if updating \phi and the init \phi in config)

Training supervision

Results can be seen with tensorboard:

$ cd ./results/
$ tensorboard --logdir=. --bindall


To test and evaluate trained policies, run:

python train_scripts4fsac.py --mode testing --test_dir <your_log_dir> --test_iter_list <iter_nums>

and the results will be recored in /results/<ENV_NAME>/<ALGO_NAME>/<EXP_TIME>/logs/tester.


When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with me before making a change.

Feel free to cite our paper with BibTex:

