OmniSafe
OmniSafe is a comprehensive and trustworthy benchmark for safe reinforcement learning, covering a multitude of SafeRL domains, and delivering a new suite of testing environments.
The simulation environment around OmniSafe and a series of reliable algorithm implementations will help the SafeRL research community easier to replicate and improve the excellent work already done, while also helping to facilitate the validation of new ideas and new algorithms.
Table of Contents
- Overview
- Implemented Algorithms
- SafeRL Environments
- Installation
- Getting Started
- The OmniSafe Team
- License
Overview
Here we provide a table for comparison of OmniSafe's algorithm core and existing algorithm baseline.
SafeRL Platform |
Backend | Engine | # Safe Algo. | Parallel CPU/GPU |
New Gym API(4) | Vision Input |
---|---|---|---|---|---|---|
Safety-Gym |
TF1 | mujoco-py (1) |
3 | CPU Only (mpi4py ) |
❌ | minimally supported |
safe-control-gym |
PyTorch | PyBullet | 5(2) | ❌ | ❌ | |
Velocity-Constraints(3) | N/A | N/A | N/A | N/A | ❌ | ❌ |
mujoco-circle |
PyTorch | N/A | 0 | N/A | ❌ | ❌ |
OmniSafe |
PyTorch | MuJoCo 2.3.0+ | 25+ | torch.distributed |
✔️ | ✔️ |
(1): Maintenance (expect bug fixes and minor updates), the last commit is 19 Nov 2021. Safety Gym depends on mujoco-py
2.0.2.7, which was updated on Oct 12, 2019.
(2): We only count the safe's algorithm.
(3): There is no official library for speed-related libraries, and its associated cost constraints are constructed from info. But the task is widely used in the study of SafeRL, and we encapsulate it in OmniSafe.
(4): In the gym 0.26.0 release update, a new API of interaction was redefined.
Implemented Algorithms
On Policy | Off Policy | Other |
|
|
Notes IPO, PCPO, CRPO, P3O, CUP will be released before 2022.12.1. Model-base is under testing, and will be released before 2022.11.25. Offline Safe will be released before 12.1. Control will be released before 2022.12.1.
SafeRL Environments
Safety Gymnasium
We designed a variety of safety-enhanced learning tasks around the latest version of Gymnasium, including safety-run, safety-circle, safety-goal, safety-button, etc., leading to a unified safety-enhanced learning benchmark environment called Safety_Gymnasium
.
Further, to facilitate the progress of community research, we redesigned Safety_Gym, removed the dependency on mujoco_py, made it created on top of Mujoco, and fixed some bugs.
After careful testing, we confirmed that it has the same dynamics parameters and training environment as the original safety gym, named safety_gym_v2
.
Here is a list of all the environments we support, some of them are being tested in our baseline and we will gradually release them within a month.
Tasks | Diffcults | Agents |
|
|
|
Vision-base Safe RL
Vision-based safety reinforcement learning lacks realistic scenarios. Although the original safety_gym
was able to minimally support visual input, the scenarios were too homogeneous. To facilitate the validation of visual-based safety reinforcement learning algorithms, we have developed a set of realistic visual safety reinforcement learning task environments, which are currently being validated on baseline, and we will release that part of the environment in Safety_Gymnasium
within a month.
For the appetizer, the images are as follows
Environment Usage
Notes: We support new Gym APIs.
import safety_gymnasium
env_name = 'SafetyPointGoal1-v0'
env = safety_gymnasium.make(env_name)
obs, info = env.reset()
terminated = False
while not terminated:
act = env.action_space.sample()
obs, reward, cost, terminated, truncated, info = env.step(act)
env.render()
Installation
Prerequisites
OmniSafe requires Python 3.8+ and PyTorch 1.10+.
Install from source
git clone https://github.com/PKU-MARL/omnisafe
cd omnisafe
conda create -n omnisafe python=3.8
conda activate omnisafe
# pelase refer to https://pytorch.org/get-started/previous-versions/ and install pytorch
# install omnisafe
pip install -e .
# install safety_gymnasium
cd omnisafe/envs/Safety_Gymnasium
pip install -e .
Examples
cd examples/
python train_on_policy.py --env-id SafetyPointGoal1-v0 --algo PPOLag --parallel 1 --seed 0
algo: PolicyGradient, PPO, PPOLag, NaturalPG, TRPO, TRPOLag, PDO, NPGLag, CPO, PCPO, FOCOPS, CPPOPid
env-id: Safety{Robot-id}{Task-id}{0/1/2}-v0, (Robot-id: Point Car), (Task-id: Goal Push Button)
parallel: Number of parallels
Getting Started
1. Run Agent from preset yaml file
import omnisafe
env = omnisafe.Env('SafetyPointGoal1-v0')
agent = omnisafe.Agent('PPOLag', env)
agent.learn()
# obs = env.reset()
# for i in range(1000):
# action, _states = agent.predict(obs, deterministic=True)
# obs, reward, cost, done, info = env.step(action)
# env.render()
# if done:
# obs = env.reset()
# env.close()
2. Run Agent from custom config dict
import omnisafe
env = omnisafe.Env('SafetyPointGoal1-v0')
custom_dict = {'epochs': 1, 'data_dir': './runs'}
agent = omnisafe.Agent('PPOLag', env, custom_cfgs=custom_dict)
agent.learn()
# obs = env.reset()
# for i in range(1000):
# action, _states = agent.predict(obs, deterministic=True)
# obs, reward, done, info = env.step(action)
# env.render()
# if done:
# obs = env.reset()
# env.close()
3. Run Agent from custom terminal config
cd omnisafe/examples
and run
python train_on_policy.py --env-id SafetyPointGoal1-v0 --algo PPOLag --parallel 5 --epochs 1
The OmniSafe Team
OmniSafe is currently maintained by Borong Zhang, Jiayi Zhou, JTao Dai, Weidong Huang, Ruiyang Sun ,Xuehai Pan, Jiamg Ji and under the instruction of Prof. Yaodong Yang. If you have any question in the process of using omnisafe, don't hesitate to ask your question in the GitHub issue page, we will reply you in 2-3 working days.
License
OmniSafe is released under the Apache License 2.0.