/rl-tools

A Fast, Portable Deep Reinforcement Learning Library for Continuous Control

Primary LanguageC++MIT LicenseMIT

RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control

Paper on arXiv | Live demo (browser) | Discord

Run tutorials on Binder Documentation

animated animated
Trained on a 2020 MacBook Pro (M1) using RLtools TD3

animated
Trained on a 2020 MacBook Pro (M1) using RLtools PPO

Benchmarks

Benchmarks of training the Pendulum swing-up using different RL libraries and across different devices (RLtools)

Benchmarks of the inference frequency for a two-layer [64, 64] fully-connected neural network across different microcontrollers (types and architectures).

Content

Algorithms

Algorithm Example
TD3 Pendulum, Car, MuJoCo Ant-v4, Acrobot
PPO Pendulum, MuJoCo Ant-v4 (CPU), MuJoCo Ant-v4 (CUDA)
SAC Pendulum (CPU), Pendulum (CUDA), Acrobot

Getting Started

The getting started documentation is divided in two parts: a tutorial on how RLtools works internally and replication instructions for the results from the paper.

Tutorial on RLtools internals

Chapter Documentation Interactive Notebook
0 Overview -
1 Containers Binder
2 Multiple Dispatch Binder
3 Deep Learning Binder
4 CPU Acceleration Binder
5 MNIST Classification Binder
6 Deep Reinforcement Learning Binder

Note: you can also run the tutorial (Jupyter Notebooks) locally using a single command:

docker run -p 8888:8888 rltools/documentation

After running the Docker container, open the link that is displayed in the CLI (http://127.0.0.1:8888/...) in your browser and enjoy tinkering with the tutorial!

Cloning the repository

To build the examples from source (either in Docker or natively), first the repository should be cloned. Instead of cloning all submodules using git clone --recursive which takes a lot of space and bandwidth we recommend cloning the main repo containing all the standalone code for RLtools and then cloning the required sets of submodules later:

git clone https://github.com/rl-tools/rl-tools.git rl_tools

Cloning submodules

There are three classes of submodules:

  1. External dependencies (in external/)
    • E.g. HDF5 for checkpointing, Tensorboard for logging, or MuJoCo for the simulation of contact dynamics
  2. Examples/Code for embedded platforms (in embedded_platforms/)
  3. Redistributable dependencies (in redistributable/)
  4. Test dependencies (in tests/lib)
  5. Test data (in tests/data)

These sets of submodules can be cloned additively/independent of eachother. For most use-cases (like e.g. most of the Docker examples) you should clone the submodules for external dependencies:

cd RLtools
git submodule update --init --recursive -- external

The submodules for the embedded platforms, the redistributable binaries and test dependencies/data can be cloned in the same fashion (by replacing external with the appropriate folder from the enumeration above). Note: Make sure that for the redistributable dependencies and test data git-lfs is installed (e.g. sudo apt install git-lfs on Ubuntu) and activated (git lfs install) otherwise only the metadata of the blobs is downloaded.

Docker

The most deterministic way to get started using RLtools not only for replication of the results but for modifying the code is using Docker. In our experiments on Linux using the NVIDIA container runtime we were able to achieve close to native performance. Docker instructions & examples

Native

In comparison to running the release binaries or building from source in Docker, the native setup heavily depends on the configuration of the machine it is run on (installed packages, overwritten defaults etc.). Hence we provide guidelines on how to setup the environment for research and development of RLtools that should run on the default configuration of the particular platform but might not work out of the box if it has been customized.

Unix (Linux and macOS)

For maximum performance and malleability for research and development we recommend to run RLtools natively on e.g. Linux or macOS. Since RLtools itself is dependency free the most basic examples don't need any platform setup. However, for an improved experience, we support HDF5 checkpointing and Tensorboard logging as well as optimized BLAS libraries which comes with some system-dependent requirements. Unix instructions & examples

Windows

Windows instructions & examples

Embedded Platforms

Inference & Training

Inference

Naming Convention

We use snake_case for variables/instances, functions as well as namespaces and PascalCase for structs/classes. Furthermore, we use upper case SNAKE_CASE for compile-time constants.

Citing

When using RLtools in an academic work please cite our publication using the following Bibtex citation:

@misc{eschmann2023rltools,
      title={RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control}, 
      author={Jonas Eschmann and Dario Albani and Giuseppe Loianno},
      year={2023},
      eprint={2306.03530},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}