The repository covers the following algorithms:
- Proximal Policy Optimization (PPO) (Reinforcement Learning):
- MLP and CNN policies
- Gaussian policies (continuous actions) and categorical policies (discrete actions)
- Benchmarks and environments covered (installation instructions below):
- MuJoCo (OpenAI Gym)
- DeepMind Control Suite
- Safety Gym
- Atari (OpenAI Gym)
- Pycolab
- Generative Adversarial Imitation Learning (GAIL) (Imitation Learning):
- MLP policies
- Gaussian policies (continuous actions)
- Benchmarks and environments covered (installation instructions below):
- MuJoCo (OpenAI Gym)
- Expert demonstrations to mimic are made available at this link (installation instructions below).
The repository also offers implementations of the following:
- Random Network Distillation (RND) exploration (intrinsic motivation-based) reward bonus
- Random Expert Distillation (RED) imitation reward (based on RND)
- Model-based imitation reward modulation, penalizing the agent when its controlability is lower (metric aligned with the prediction error of the learned forward model, like curiosity, but in the opposite direction)
- Absorbing state wrapping mechanism as described in Discriminator-Actor-Critic (DAC)
- Positive-Unlabeled Reward Learning (PURL) discriminator training
Make sure you have GLFW and Open MPI installed on your system:
- if you are using macOS, run:
brew install open-mpi glfw3
- if you are using Ubuntu, run:
sudo apt -y install libopenmpi-dev libglfw3
Create a virtual enviroment for Python development using Anaconda or Miniconda:
- Create a conda environment for Python 3.7 called 'myenv', activate it, and upgrade
pip
:
conda create -n myenv python=3.7
conda activate myenv
# Once in the conda environment, upgrade the pip binary it uses to the latest
pip install --upgrade pip
- Install various core Python libraries:
# EITHER with versions that were used for this release
pip install pytest==5.2.1 pytest-instafail==0.4.1 flake8==3.7.9 wrapt==1.11.2 pillow==6.2.1 six==1.15.0 tqdm==4.36.1 pyyaml==5.1.2 psutil==5.6.3 cloudpickle==1.2.2 tmuxp==1.5.4 lockfile==0.12.2 numpy==1.17.4 pandas==0.25.2 scipy==1.3.1 scikit-learn==0.21.3 h5py==2.10.0 matplotlib==3.1.1 seaborn==0.9.0 pyvips==2.1.8 scikit-image==0.16.2 torch==1.6.0 torchvision==0.7.0
conda install -y -c conda-forge opencv=3.4.7 pyglet=1.3.2 pyopengl=3.1.5 mpi4py=3.0.2 cython=0.29.13 watchdog=0.9.0
pip install moviepy==1.0.1 imageio==2.6.1 wandb==0.10.10
# OR without versions (pulls the latest versions for each of these releases)
pip install pytest pytest-instafail flake8 wrapt pillow six tqdm pyyaml psutil cloudpickle tmuxp lockfile numpy pandas scipy scikit-learn h5py matplotlib seaborn pyvips scikit-image torch torchvision
conda install -y -c conda-forge opencv pyglet pyopengl mpi4py cython watchdog
pip install moviepy imageio wandb
- Install MuJoCo following the instructions laid out at
https://github.com/openai/mujoco-py#install-mujoco
- Build
mujoco-py
from source in editable mode:
git clone https://github.com/openai/mujoco-py.git
cd mujoco-py
pip install -e .
- Build
gym
from source in editable mode (cf.http://gym.openai.com/docs/#installation
):
git clone https://github.com/openai/gym.git
cd gym
pip install -e ".[all]"
- [Optional] Build
safety-gym
from source in editable mode (cf.https://github.com/openai/safety-gym#installation
):
git clone https://github.com/openai/safety-gym.git
cd safety-gym
pip install -e .
- [Optional] Build
pycolab
from source in editable mode:
git clone https://github.com/deepmind/pycolab
cd pycolab
pip install -e .
- [Optional] Build
dm_control
from source in editable mode (cf.https://github.com/deepmind/dm_control#requirements-and-installation
):
git clone https://github.com/deepmind/dm_control
cd dm_control
pip install -e .
Download the expert demonstrations complementing this repository and make them accessible:
- Download the expert demonstrations that we have shared at this link;
- Place them at the desired location in your filesystem;
- Create the environment variable:
export DEMO_DIR=/where/you/downloaded/and/placed/the/demos
.
While one can launch any job via main.py
, it is advised to use spawner.py
,
designed to spawn a swarm of experiments over multiple seeds and environments in one command.
To get its usage description, type python spawner.py -h
.
usage: spawner.py [-h] [--config CONFIG] [--conda_env CONDA_ENV]
[--env_bundle ENV_BUNDLE] [--num_workers NUM_WORKERS]
[--deployment {tmux,slurm}] [--num_seeds NUM_SEEDS]
[--caliber {short,long,verylong,veryverylong}]
[--deploy_now] [--no-deploy_now] [--sweep] [--no-sweep]
[--wandb_upgrade] [--no-wandb_upgrade]
[--num_demos NUM_DEMOS [NUM_DEMOS ...]] [--debug]
[--no-debug] [--wandb_dryrun] [--no-wandb_dryrun]
[--debug_lvl DEBUG_LVL]
Job Spawner
optional arguments:
-h, --help show this help message and exit
--config CONFIG
--conda_env CONDA_ENV
--env_bundle ENV_BUNDLE
--num_workers NUM_WORKERS
--deployment {tmux,slurm}
deploy how?
--num_seeds NUM_SEEDS
--caliber {short,long,verylong,veryverylong}
--deploy_now deploy immediately?
--no-deploy_now
--sweep hp search?
--no-sweep
--wandb_upgrade upgrade wandb?
--no-wandb_upgrade
--num_demos NUM_DEMOS [NUM_DEMOS ...], --list NUM_DEMOS [NUM_DEMOS ...]
--debug toggle debug/verbose mode in spawner
--no-debug
--wandb_dryrun toggle wandb offline mode
--no-wandb_dryrun
--debug_lvl DEBUG_LVL
set the debug level for the spawned runs
Here is an example:
python spawner.py --config tasks/train_mujoco_ppo.yaml --env_bundle debug --wandb_upgrade --no-sweep --deploy_now --caliber short --num_workers 2 --num_seeds 3 --deployment tmux --conda_env myenv --wandb_dryrun --debug_lvl 2
Check the argument parser in spawner.py
to know what each of these arguments mean,
and how to adapt them to your needs.