CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments using AWS Batch. The highlight features of CleanRL are:
- 📜 Single-file implementation
- Every detail about an algorithm is put into the algorithm's own file. It is therefore easier to fully understand an algorithm and do research with.
- 📊 Benchmarked Implementation (7+ algorithms and 34+ games at https://benchmark.cleanrl.dev)
- 📈 Tensorboard Logging
- 🪛 Local Reproducibility via Seeding
- 🎮 Videos of Gameplay Capturing
- 🧫 Experiment Management with Weights and Biases
- 💸 Cloud Integration with docker and AWS
Good luck have fun 🚀
- Deep Q-Learning (DQN)
- dqn.py
- For discrete action space.
- dqn_atari.py
- For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
- dqn_atari_visual.py
- Adds q-values visulization for
dqn_atari.py
.
- Adds q-values visulization for
- dqn.py
- Categorical DQN (C51)
- c51.py
- For discrete action space.
- c51_atari.py
- For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
- c51_atari_visual.py
- Adds return and q-values visulization for
dqn_atari.py
.
- Adds return and q-values visulization for
- c51.py
- Proximal Policy Gradient (PPO)
- All of the PPO implementations below are augmented with some code-level optimizations. See https://costa.sh/blog-the-32-implementation-details-of-ppo.html for more details
- ppo.py
- For discrete action space.
- ppo_continuous_action.py
- For continuous action space. Also implemented Mujoco-specific code-level optimizations
- ppo_atari.py
- For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
- ppo_atari_visual.py
- Adds action probability visulization for
ppo_atari.py
.
- Adds action probability visulization for
- experiments/ppo_self_play.py
- Implements a self-play agent for https://github.com/hardmaru/slimevolleygym
- experiments/ppo_microrts.py
- Implements invalid action masking and handling of
MultiDiscrete
action space for https://github.com/vwxyzjn/gym-microrts
- Implements invalid action masking and handling of
- experiments/ppo_simple.py
- (Not recommended for using) Naive implementation for discrete action space. I keep it here for educational purposes because I feel this is what most people would implement if they had just read the paper, usually unaware of the amount of implementation details that come with the well-tuned PPO implmentation.
- experiments/ppo_simple_continuous_action.py
- (Not recommended for using) Naive implementation for continuous action space.
- Soft Actor Critic (SAC)
- sac_continuous_action.py
- For continuous action space.
- sac_continuous_action.py
- Deep Deterministic Policy Gradient (DDPG)
- ddpg_continuous_action.py
- For continuous action space.
- ddpg_continuous_action.py
- Twin Delayed Deep Deterministic Policy Gradient (TD3)
- td3_continuous_action.py
- For continuous action space.
- td3_continuous_action.py
- Apex Deep Q-Learning (Apex-DQN)
- apex_dqn_atari_visual.py
- For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
- apex_dqn_atari_visual.py
To run experiments locally, give the following a try:
git clone https://github.com/vwxyzjn/cleanrl.git && cd cleanrl
# we strongly recommend to use `venv` to manage dependencies
# this will make the experiment much more reproducible!
python -m venv venv
source venv/bin/activate
pip install cleanrl
python cleanrl/ppo.py \
--seed 1 \
--gym-id CartPole-v0 \
--total-timesteps 50000 \
# open another temrminal and enter `cd cleanrl/cleanrl`
tensorboard --logdir runs
To use wandb integration, sign up an account at https://wandb.com and copy the API key. Then run
source venv/bin/activate
wandb login # only required for the first time
python cleanrl/ppo.py \
--seed 1 \
--gym-id CartPole-v0 \
--total-timesteps 50000 \
--prod-mode \
--wandb-project-name cleanrltest
The following instructions assume linux environements. We first install the dependencies:
# install atari, pybullet, procgen, box2d, pettingzoo
source venv/bin/activate
pip install cleanrl[all]
# if you are using zsh, you will need to do
# `pip install cleanrl\[all\]`
# install mujoco
curl -OL https://www.roboti.us/download/mujoco200_linux.zip
unzip mujoco200_linux.zip -d ~/mujoco/
mv ~/mujoco/mujoco200_linux ~/mujoco/mujoco200
unzip mujoco200_linux.zip -d ~/mujoco/
rm mujoco200_linux.zip
pip install gym[mujoco]
Now we can run the experiments:
source venv/bin/activate
cd cleanrl
# atari
python dqn_atari_visual.py --gym-id BeamRiderNoFrameskip-v4
# pybullet
python td3_continuous_action.py --gym-id MinitaurBulletDuckEnv-v0
# procgen
python ppo_procgen_fast.py --gym-id starpilot
# box2d
python experiments/ppo_car_racing.py
# pettingzoo
python ppo_pettingzoo.py
Open RL Benchmark by CleanRL is a comprehensive, interactive and reproducible benchmark of deep Reinforcement Learning (RL) algorithms. It uses Weights and Biases to keep track of the experiment data of popular deep RL algorithms (e.g. DQN, PPO, DDPG, TD3) in a variety of games (e.g. Atari, Mujoco, PyBullet, Procgen, Griddly, MicroRTS). The experiment data includes:
- reproducibility info:
- metrics:
Open RL Benchmark has over 1000+ experiments including runs from other projects, which is overwhelming to present in a single report. Instead, we present the results in separate reports. Please click on the links below to access them.
- Atari results
- Mujoco results
- PyBullet results
- Procgen results
- Griddly results
- Gym-μRTS results
- Slimevolleygym results
- PySC2 results
- CarRacing-v0
- Montezuma Revenge results
We hope it could bring a new level of transparency, openness, and reproducibility. Our plan is to benchmark as many algorithms and games as possible. If you are interested, please join us and contribute more algorithms and games. To get started, check out our contribution guide and our roadmap for the Open RL Benchmark
Check out the documentation here
We have a Discord Community for support. Feel free to ask questions. Posting in Github Issues and PRs are also welcome. Also our past video recordings are available at YouTube
We have a short contribution guide here https://github.com/vwxyzjn/cleanrl/blob/master/CONTRIBUTING.md. Consider adding new algorithms or test new games on the Open RL Benchmark (https://benchmark.cleanrl.dev)
Big thanks to all the contributors of CleanRL!
Please consider using the following Bibtex entry:
@misc{cleanrl,
author = {Shengyi Huang and Rousslan Dossa and Chang Ye},
title = {CleanRL: High-quality Single-file Implementation of Deep Reinforcement Learning algorithms},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/vwxyzjn/cleanrl/}},
}
I have been heavily inspired by the many repos and blog posts. Below contains a incomplete list of them.
- http://inoryy.com/post/tensorflow2-deep-reinforcement-learning/
- https://github.com/seungeunrho/minimalRL
- https://github.com/Shmuma/Deep-Reinforcement-Learning-Hands-On
- https://github.com/hill-a/stable-baselines
The following ones helped me a lot with the continuous action space handling: