/fastpbrl

Vectorization techniques for fast population-based training.

Primary LanguagePythonApache License 2.0Apache-2.0

Fast Population-Based Reinforcement Learning

PyPI Python Version Jax 0.2.26 Code style: black pre-commit

This repository contains the code for the paper "Fast Population-Based Reinforcement Learning on a Single Machine paper from InstaDeep", (Flajolet et al., 2022) 💻⚡.

First-time setup

Install Docker

This code requires docker to run. To install docker please follow the online instructions here. To enable the code to run on GPU, please install Nvidia-docker (as well as the latest nvidia driver available for your GPU).

Build and run a docker image

Once docker and docker Nvidia are installed, you can simply build the docker image with the following command:

make build

and, once the image is built, start the container with:

make dev_container

Inside the container, you can run the nvidia-smi command to verify that your GPU is found.

Run preconfigured scripts

Replicate the experiments from the paper

We provide scripts and commands to replicate the experiments discussed in the paper. All these commands are defined in the Makefile at the root of the repository.

To replicate the experiments corresponding to Figure 2 (where we measure the runtime of a population-wide update step with various implementations), run:

make run_timing_sactd3
make run_timing_dqn

To replicate the experiments discussed in Section 5 (which correspond to full training runs), run the following:

make run_td3_cemrl
make run_td3_dvd
make run_td3_pbt
make run_sac_pbt

Note that dvd training runs are unstable and sometimes crash early on due to NaNs.

We use tensorboard to log metrics during the training run. The tensorboard command to run to visualize them is printed when the experiment starts.

Launch a test script

Run the following command to start a short test which validates that the code in the training scripts is working as expected.

make test_training_scripts

Contributors

Citing this work

If you use the code or data in this package, please cite:

@inproceedings{flajolet2022fast,
  title={Fast Population-Based Reinforcement Learning on a Single Machine},
  author={Flajolet, Arthur and Monroc, Claire Bizon and Beguir, Karim and Pierrot, Thomas},
  booktitle={International Conference on Machine Learning},
  pages={6533--6547},
  year={2022},
  organization={PMLR}
}