Madrona RL Environments

Implementation of various RL Environments in Madrona


To use Madrona with GPU, you need a CUDA version of at least 11.7 and a cmake version of at least 3.18. For these environments, you also need to have conda environments (miniconda/anaconda).

To install miniconda (from miniconda3 instructions):

mkdir miniconda3
wget -O ~/miniconda3/
bash ~/miniconda3/ -b -u -p ~/miniconda3
rm miniconda3/
miniconda3/bin/conda init bash
# restart shell afterwards


conda create -n madrona python=3.10
conda activate madrona
pip install torch numpy tensorboard

git clone
cd madrona_rl_envs
git submodule update --init --recursive
mkdir build
cd build
cmake ..
make -j
cd ..

pip install -e .

pip install -e overcooked_ai

NOTE: For cmake, you make need to specify the cuda tookit directory as follows:

cmake -D CUDAToolkit_ROOT=/usr/local/cuda-12.0 ..

Running scripts

Before running any scripts, ensure that the madrona conda environment is active.

For cartpole:

cd scripts

# simulating the environment (madrona)
python --num-envs 32

# learning with madrona
python --num-envs 32 --madrona True --num-steps 200 --total-timesteps 160000

# baseline (numpy)
python --num-envs 32 --madrona False --num-steps 200 --total-timesteps 160000

For balance beam:

cd scripts

# simulating the environment (madrona)
python --num-envs 32

# learning with madrona
python --num-envs 1000 --num-steps 10 --total-timesteps 40000000 --update-epochs 4

For overcooked:

cd scripts

# simulating the environment (madrona)
# choose between cramped_room, asymmetric_advantages_tomato, bonus_order_test (manually edit overcooked_example to add more layouts)
python --num-envs 32 --layout cramped_room

For hanabi:

cd scripts

# learning with madrona
MADRONA_MWGPU_KERNEL_CACHE=/tmp/hancache python --num-envs 1000 --num-steps 100 --num-updates 1000 --learning-rate 7e-4 --update-epochs 15 --num-minibatches 1 --madrona True --ent-coef 0.015 --anneal-lr False --hanabi-type full

Hanabi training ablations:

cd scripts

# Change list on line 5 to have the desired number of environments; each run will last 30 mins