A collection of trained Reinforcement Learning (RL) agents, with tuned hyperparameters, using Stable Baselines.
We are looking for contributors to complete the collection!
Goals of this repository:
- Provide a simple interface to train and enjoy RL agents
- Benchmark the different Reinforcement Learning algorithms
- Provide tuned hyperparameters for each environment and RL algorithm
- Have fun with the trained agents!
If the trained agent exists, then you can see it in action using:
python enjoy.py --algo algo_name --env env_id
For example, enjoy A2C on Breakout during 5000 timesteps:
python enjoy.py --algo a2c --env BreakoutNoFrameskip-v4 --folder trained_agents/ -n 5000
The hyperparameters for each environment are defined in hyperparameters/algo_name.yml
.
If the environment exists in this file, then you can train an agent using:
python train.py --algo algo_name --env env_id
For example (with tensorboard support):
python train.py --algo ppo2 --env CartPole-v1 --tensorboard-log /tmp/stable-baselines/
Train for multiple environments (with one call) and with tensorboard logging:
python train.py --algo a2c --env MountainCar-v0 CartPole-v1 --tensorboard-log /tmp/stable-baselines/
Continue training (here, load pretrained agent for Breakout and continue training for 5000 steps):
python train.py --algo a2c --env BreakoutNoFrameskip-v4 -i trained_agents/a2c/BreakoutNoFrameskip-v4.pkl -n 5000
Record 1000 steps:
python -m utils.record_video --algo ppo2 --env BipedalWalkerHardcore-v2 -n 1000
Scores can be found in benchmark.md
. To compute them, simply run python -m utils.benchmark
.
7 atari games from OpenAI benchmark (NoFrameskip-v4 versions).
RL Algo | BeamRider | Breakout | Enduro | Pong | Qbert | Seaquest | SpaceInvaders |
---|---|---|---|---|---|---|---|
A2C | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
ACER | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
ACKTR | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
PPO2 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
DQN | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Additional Atari Games (to be completed):
RL Algo | MsPacman |
---|---|
A2C | ✔️ |
ACER | ✔️ |
ACKTR | |
PPO2 | ✔️ |
DQN |
RL Algo | CartPole-v1 | MountainCar-v0 | Acrobot-v1 | Pendulum-v0 | MountainCarContinuous-v0 |
---|---|---|---|---|---|
A2C | ✔️ | ✔️ | ✔️ | ||
ACER | ✔️ | ✔️ | ✔️ | N/A | N/A |
ACKTR | ✔️ | ✔️ | ✔️ | N/A | N/A |
PPO2 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
DQN | ✔️ | ✔️ | ✔️ | N/A | N/A |
DDPG | N/A | N/A | N/A | ✔️ | ✔️ |
RL Algo | BipedalWalker-v2 | LunarLander-v2 | LunarLanderContinuous-v2 | BipedalWalkerHardcore-v2 | CarRacing-v0 |
---|---|---|---|---|---|
A2C | ✔️ | ||||
ACER | N/A | ✔️ | N/A | N/A | N/A |
ACKTR | N/A | ✔️ | N/A | N/A | N/A |
PPO2 | ✔️ | ✔️ | ✔️ | ✔️ | |
DQN | N/A | ✔️ | N/A | N/A | N/A |
DDPG | N/A | ✔️ |
See https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet/gym/pybullet_envs.
Similar to MuJoCo Envs but with a free simulator: pybullet. We are using BulletEnv-v0
version.
RL Algo | Walker2D | HalfCheetah | Ant | Reacher | Hopper | Humanoid |
---|---|---|---|---|---|---|
PPO2 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
DDPG |
PyBullet Envs (Continued)
RL Algo | Minitaur | MinitaurDuck | InvertedDoublePendulum | InvertedPendulumSwingup |
---|---|---|---|---|
PPO2 | ✔️ | ✔️ | ✔️ | ✔️ |
DDPG |
You can train agents online using colab notebook.
apt-get install swig cmake libopenmpi-dev zlib1g-dev ffmpeg
pip install stable-baselines==2.2.1 box2d box2d-kengz pyyaml pybullet==2.1.0 pytablewriter
Please see Stable Baselines README for alternatives.
Build docker image (CPU):
docker build . -f docker/Dockerfile.cpu -t rl-baselines-zoo-cpu
GPU:
docker build . -f docker/Dockerfile.gpu -t rl-baselines-zoo
Pull built docker image (CPU):
docker pull araffin/rl-baselines-zoo-cpu
GPU image:
docker pull araffin/rl-baselines-zoo
Run script in the docker image:
./run_docker_cpu.sh python train.py --algo ppo2 --env CartPole-v1
To run tests, first install pytest, then:
python -m pytest -v tests/
If you trained an agent that is not present in the rl zoo, please submit a Pull Request (containing the hyperparameters and the score too).