Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

Anonymous code release for ICLR 22 paper submission, named "Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning". This repository develops Heterogeneous Agent Trust Region Policy Optimisation (HATRPO) and Heterogeneous-Agent Proximal Policy Optimisation (HAPPO) algorithms on the bechmarks of SMAC and Multi-agent MUJOCO. HATRPO and HAPPO are the first trust region methods for multi-agent reinforcement learning with theoretically-justified monotonic improvement guarantee. Performance wise, it is the new state-of-the-art algorithm against its rivals such as IPPO, MAPPO and MADDPG


Create environment

conda create -n env_name python=3.9
conda activate env_name
pip install -r requirements.txt
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia

Multi-agent MuJoCo

Following the instructions in and to setup a mujoco environment. In the end, remember to set the following environment variables:


StarCraft II & SMAC

Run the script


Or you could install them manually to other path you like, just follow here:

How to run

When your environment is ready, you could run shell scripts provided. For example:

cd scripts
./  # run with HAPPO/HATRPO on Multi-agent MuJoCo
./  # run with HAPPO/HATRPO on StarCraft II

If you would like to change the configs of experiments, you could modify sh files or look for config files for more details. And you can change algorithm by modify algo=happo as algo=hatrpo.

Some experiment results


Multi-agent MuJoCo on MAPPO