This is the Python code for reproducing the experiments in the corresponding paper which is currently under review at Distributed Autonomous Robotic Systems (DARS 2024).
Please make sure you have:
- Python 3.10+
- CUDA GPU(s)
Start by cloning the repository and submodules, then enter the top-level directory:
git clone --recurse-submodules <>.git && cd marl-comms-optimisation
Set WANDB_ENTITY
and WANDB_PROJECT
for wandb
logging in config.py
Each MARL suite requires its own setup. Use the relevant virtual environment for running the associated experiments.
-
Create a new virtual environment
python -m venv vmas_env && source vmas_env/bin/activate
-
Install dependencies
pip install -r requirements_vmas.txt
-
Create directories to store weights and samples
mkdir weights && mkdir samples
In the sections that follow, substitute <task>
with one of norm_discovery, norm_swarm, norm_flocking
.
Uniformly randomly sample 1M observations from the observation space of Discovery or Swarm.
python sample_vmas.py --scenario <task> --steps 1000000 --random --device cpu
The results will be saved to samples/<task>_<time>.pt
Note: You can modify the number of agents you collect results for using the relevant task entry in scenario_config.py
.
Note: --latent
is the expected latent dimension of the set autoencoder. To reproduce our work, this should be equal agent observation dim * no. of agents
.
python train_sae.py --scenario <task> --latent <> --data samples/<task>_<time>.pt
A state dict is saved every 2000 steps to weights/sae_<task>_<epoch>_<time>.pt
and weights/sae_<task>_latest.pt
.
Note: if you have multiple samples (e.g. if you want to train a strategy for 1, 2, and 3 agents), then use train_sae_scaling.py
instead, specifying a list like --data sample1.pt sample2.pt sample3.pt
.
Note: --pisa_dim <>
is the dimension of each individual agent observation expected by the autoencoder.
python policy.py --train_specific --scenario <task> --pisa_dim <> --seed <> --home
As this policy trains, the learned task-specific communication strategy will be saved to weights/sae_policy_wk<worker_index>_<time>.pt
every --eval_interval
steps.
Training stats should be saved to ~/ray_results
where they can be viewed with tensorboard --logdir ~/ray_results
No-comms:
python policy.py --no_comms --scenario <task> --pisa_dim <> --seed <> --home
Task-agnostic:
python policy.py --task_agnostic --scenario <task> --pisa_dim <> --pisa_path weights/sae_<task>_latest.pt --seed <> --home
Task-specific:
python policy.py --task_specific --scenario <task> --pisa_dim <> --pisa_path weights/sae_policy_wk<worker_index>_<time>.pt --seed <> --home
Note: If you wish to train a policy with a different number of agents than the default (e.g. to test out-of-distribution performance of comms strategies), then specify --scaling_agents <>
with the number of agents you wish to use.
The instruction for this section are very similar to running the VMAS experiments above.
-
Change directory to the Melting Pot folder
cd meltingpot-marlcomms
-
Create a new virtual environment
python -m venv mp_env && source mp_venv/bin/activate
-
Install dependencies
pip install -r requirements_mp.txt && pip install -e .
-
Create directories to store weights and samples
mkdir weights && mkdir samples
Uniformly randomly sample 1M observations from the observation space of Discovery or Swarm.
python sample_meltingpot.py --scenario <task> --steps 1000000
The results will be saved to samples/<task>_<time>.pt
python train_cnn.py --scenario <task> --data_path samples/<task>_<time>.pt --image_width <>
Weights will be saved periodically to weights/cnn_<task>_best.pt
.
Note: --latent
is the expected latent dimension of the set autoencoder. To reproduce our work, this should be equal agent observation dim * no. of agents
.
python train_pisa.py --scenario <task> --data_path samples/<task>_<time>.pt --cnn_path weights/cnn_<task>_best.pt
Weights will be saved periodically to weights/pisa_<task>_best.pt
Note: --pisa_dim <>
is the dimension of each individual agent observation expected by the set autoencoder.
python policy_mp.py --train_specific --scenario <task> --cnn_path weights/cnn_<task>_best.pt --pisa_dim <> --seed <>
As this policy trains, the learned task-specific communication strategy will be saved to weights/pisa_policy_wk<worker_index>_<time>.pt
every eval_interval
steps.
Training stats should be saved to ~/ray_results
where they can be viewed with tensorboard --logdir ~/ray_results
No-comms:
python policy_mp.py --no_comms --scenario <task> --cnn_path weights/cnn_<task>_best.pt --pisa_dim <> --seed <>
Task-agnostic:
python policy_mp.py --task_agnostic --scenario <task> --cnn_path weights/cnn_<task>_best.pt --pisa_dim <> --pisa_path weights/pisa_<task>_best.pt --seed <>
Task-specific:
python policy_mp.py --task_specific --scenario <task> --cnn_path weights/cnn_<task>_best.pt --pisa_dim <> --pisa_path weights/pisa_policy_wk<worker_index>_<time>.pt --seed <>
If you have problems installing dmlab2d
, please see instructions at https://github.com/google-deepmind/lab2d to build it from source.