/FSRL

🚀 A fast safe reinforcement learning library in PyTorch

Primary LanguagePythonMIT LicenseMIT


Python 3.8+ Documentation Status License CodeCov Tests

Key Features | Documentation | Installation | Quick Start | Contributing


The Fast Safe Reinforcement Learning (FSRL) package provides modularized implementations of Safe RL algorithms based on PyTorch and the Tianshou framework. Safe RL is a rapidly evolving subfield of RL, focusing on ensuring the safety of learning agents during the training and deployment process. The study of Safe RL is essential because it addresses the critical challenge of preventing unintended or harmful actions while still optimizing an agent's performance in complex environments.

This project offers high-quality and fast implementations of popular Safe RL algorithms, serving as an ideal starting point for those looking to explore and experiment in this field. By providing a comprehensive and accessible toolkit, the FSRL package aims to accelerate research in this crucial area and contribute to the development of safer and more reliable RL-powered systems.

Please note that this project is still under active development, and major updates might be expected. Your feedback and contributions are highly appreciated, as they help us improve the FSRL package.

🌟 Key Features

FSRL is designed with several key aspects in mind:

  • High-quality implementations. For instance, the CPO implementation by SafetyGym fails to satisfy constraints according to their benchmark results. As a result, many safe RL papers that adopt these implementations may also report failure results. However, we discovered that with appropriate hyper-parameters and our implementation, it can achieve good safety performance in most tasks as well.
  • Fast training speed. FSRL cares about accelerating experimentation and benchmarking processes, providing fast training times for popular safe RL tasks. For example, most algorithms can solve the SafetyCarCircle-v0 task in 10 minutes with 4 cpus. The CVPO algorithm implementation can also achieve 5x faster training than the original implementation.
  • Well-tuned hyper-parameters. We carefully studied the effects of key hyperparameters for these algorithms and plan to provide a practical guide for tuning them. We believe both implementations and hyper-parameters play a crucial role in learning a successful safe RL agent.
  • Modular design and easy usability. FSRL is built upon the elegant RL framework Tianshou. We provide an agent wrapper, refactored loggers for both Tensorboard and Wandb, and pyrallis configuration support to further facilitate usage. Our algorithms also support multiple constraints and standard RL tasks (like Mujoco).

The implemented safe RL algorithms include:

Algorithm Type Description
CPO on-policy Constrained Policy Optimization
FOCOPS on-policy First Order Constrained Optimization in Policy Space
PPOLagrangian on-policy PPO with PID Lagrangian
TRPOLagrangian on-policy TRPO with PID Lagrangian
DDPGLagrangian off-on-policy (1) DDPG with PID Lagrangian
SACLagrangian off-on-policy (1) SAC with PID Lagrangian
CVPO off-policy Constrained Variational Policy Optimization

(1): Off-on-policy means that the base learning algorithm is off-policy, but the Lagrange multiplier is updated in an on-policy fashion. Our previous finding suggested that using off-policy style Lagrange update may result in poor performance

The implemented algorithms are well-tuned for many tasks in the following safe RL environments, which cover the majority of tasks in recent safe RL papers:

  • BulletSafetyGym, FSRL will install this environment by default as the testing ground.
  • SafetyGymnasium, note that you need to install it from the source because our current version adopts the gymnasium API.

Note that the latest versions of FSRL and the above environments use the gymnasium >= 0.26.3 API. But if you want to use the old gym API such as the safety_gym, you can simply change the example scripts from import gymnasium as gym to import gym.

🔍 Documentation

The tutorials and API documentation are hosted on fsrl.readthedocs.io.

The majority of the API design in FSRL follows Tianshou, and we aim to reuse their modules as much as possible. For example, the Env, Batch, Buffer, and (most) Net modules are used directly in our repo. This means that you can refer to their comprehensive documentation to gain a good understanding of the code structure. We highly recommend you read the following Tianshou tutorials:

We observe that for most existing safe RL environments, a few layers of neural networks can solve them quite effectively. Therefore, we provide an 'Agent' class with default MLP networks to facilitate the usage. You can refer to the tutorial for more details.

Example training and evaluation scripts for both default MLP agent and customized networks are available at the examples folder.

🛠️ Installation

FSRL requires Python >= 3.8. You can install it from source by:

git clone https://github.com/liuzuxin/fsrl.git
cd fsrl
pip install -e .

You can also directly install it with pip through GitHub:

pip install git+https://github.com/liuzuxin/fsrl.git@main --upgrade

You can check whether the installation is successful by:

import fsrl
print(fsrl.__version__)

🚀 Quick Start

Training with default MLP agent

This is an example of training a PPO-Lagrangian agent with a Tensorboard logger and default parameters.

First, import relevant packages:

import bullet_safety_gym
import gymnasium as gym
from tianshou.env import DummyVectorEnv
from fsrl.agent import PPOLagAgent
from fsrl.utils import TensorboardLogger

Then initialize the environment, logger, and agent:

task = "SafetyCarCircle-v0"
# init logger
logger = TensorboardLogger("logs", log_txt=True, name=task)
# init the PPO Lag agent with default parameters
agent = PPOLagAgent(gym.make(task), logger)
# init the envs
training_num, testing_num = 10, 1
train_envs = DummyVectorEnv([lambda: gym.make(task) for _ in range(training_num)])
test_envs = DummyVectorEnv([lambda: gym.make(task) for _ in range(testing_num)])

Finally, start training:

agent.learn(train_envs, test_envs, epoch=100)

You can check the experiment results in the logs/SafetyCarCircle-v0 folder.

Training with the example scripts

We provide easy-to-use example training script for all the agents in the examples/mlp folder. Each training script is by default use the Wandb logger and Pyrallis configuration system. The default hyper-parameters are located the fsrl/config folder. You have three alternatives to run the experiment with your customized hyper-parameters:

M1. Directly override the parameters via the command line:

python examples/mlp/train_ppol_agent.py --arg value --arg2 value2 ...

where --arg specify the parameter you want to override. For example, --task SafetyAntRun-v0. Note that if you specify --use_default_cfg 1, the script will automatically use the task's default parameters for training. We plan to release more default configs in the future.

M2. Use pre-defined yaml or json or toml configs.

For example, you want to use a different learning-rate and training epochs from our default ones, create a my_cfg.yaml:

task: "SafetyDroneCircle-v0"
epoch: 500
lr: 0.001

Then you can starting training with above parameters by:

python examples/mlp/train_ppol_agent.py --config my_cfg.yaml

where --config specify the path of the configuration parameters.

M3. Inherent the config dataclass in the fsrl/config folder.

For example, you can inherent the PPOLagAgent config by:

from dataclasses import dataclass
from fsrl.config.ppol_cfg import TrainCfg

@dataclass
class MyCfg(TrainCfg):
    task: str = "SafetyDroneCircle-v0"
    epoch: int = 500
    lr: float = 0.001

@pyrallis.wrap()
def train(args: MyCfg):
    ...

Then, you can start training with your own default configs:

python examples/mlp/train_ppol_agent.py

Note that our example scripts support the auto_name feature, meaning that it can automatically compare your specified hyper-parameters with our default ones, and create the experiment name based on the difference. The default training statistics are saved in the logs directory.

Training with cutomized networks

While the pre-defined MLP agent is sufficient for solving many existing safe RL benchmarks, for more complex tasks, it may be necessary to customize the value and policy networks. Our modular design supports Tianshou's style training scripts. Example training scripts can be found in the examples/customized folder. For more details on building networks, please refer to Tianshou's tutorial, as our algorithms are mostly compatible with their networks.

Evaluate trained models

To evaluate a trained model, for example, a pre-trained PPOLag model in the logs/exp_name folder, run:

python examples/mlp/eval_ppol_agent.py --path logs/exp_name --eval_episodes 20

It will load the saved config.yaml from logs/exp_name/config.yaml and pre-trained model from logs/exp_name/checkpoint/model.pt, run 20 episodes and print the average reward and cost. If the best model is saved during training, you can evaluate it by setting --best 1.

Related Projects

FSRL is heavily inspired by the Tianshou project. In addition, there are several other remarkable safe RL-related projects:

  • Safety-Gymnasium, a well-maintained and customizable safe RL environments based on Mujoco.
  • Bullet-Safety-Gym, a tuned and fast safe RL environments based on PyBullet.
  • Safe-Multi-Agent-Mujoco, a multi-agent safe RL environments based on Mujoco.
  • Safe-Control-Gym, a learning-based control and RL library with PyBullet.
  • OmniSafe, a well-maintained infrastructural framework for safe RL algorithms.
  • SafePO, another benchmark repository for safe RL algorithms.

Contributing

The main maintainers of this project are: Zuxin Liu, Zijian Guo.

If you have any suggestions or find any bugs, please feel free to submit an issue or a pull request. We welcome contributions from the community!