Mini RL Lab

Easy Agent Experiments for Beginners

I wrote this set of scripts to help me research and experiement with the latest concepts in RL, as well as a way to learn Python and PyTorch.

It is a setup and workflow that works well for me to debug and experiment with concepts like agent algorithms, world models, planning, plasticity, transformers etc, and other beginners might find it a useful starting point for their own experiments.

The focus of Mini RL Lab is on continous control gym-like environments aimed at physical systems or expensive-to-sample simulations - my personal research interest, and it makes the problem and architecture space tractable.

The basis are CleanRL's PPO and SAC agents [https://github.com/vwxyzjn/cleanrl] which I modified to:

Separate the environment rollout and logging from the agent code. CleanRL's single file approach is great but I find this arrangement easier for experiments
Simplify the code, improve performance where possible
Use different specialised training scripts
Include algorithms + variants as baselines with which to compare

Benefits

Agents based on established, tried and tested baselines from CleanRL
Agents are structured for easy experimentation, whilst staying ~"one file"
Various performance considerations such as minimising cpu<>gpu syncs, data transfers from buffer etc
1. Helpful for those of us limited to one workstation and a midrange GPU
Inline comments document design choices and links to source papers
Learn scripts implement a lot best practices I discovered as I went, minor (data logging structure) to major (multiprocessing allows running a number of parallel agents with different seeds, essential in RL)

Prerequisites

Pytorch 2 (though 1.x will work with small changes)
Numpy (1.25 though older should work)
Tensorboard
Gymnasium[Box2D] and/or [Mujoco] (https://gymnasium.farama.org)
- Or other gym compatible environment of choice
Bayesian Optimisation (https://github.com/bayesian-optimization/BayesianOptimization)

Quickstart

Test a change quickly for major errors:

Python learn_simple.py

Training run with multiple random seeds logging to tensorboard:

Python learn_simple.py --log --seed 8 --name "testing X new feature"

Run a vectorised environment with cuda and log:

Python learn_vectorised.py --log --cuda

Use bayesian optimisation to optimise hyperparameter(s):

Python hypertune.py

Usage Notes