This repository includes the implementation of the ICML 2024 paper titled Open Ad Hoc Teamwork with Cooperative Game Theory:
- environments: Wolfpack, LBF;
- experimental setups: variant settings of agent-type sets for training and testing;
- algorithms: CIAO-S, CIAO-C and their variants, as well as the baseline algorithm GPL.
The demos of experiments are shown on https://sites.google.com/view/ciao2024.
1. Requirements Following GPL Settings
(1) Set up a Python environment with Python 3.7.12.
(2) Install required packages, execute the following command:
pip install -r requirements.txt
(3) Install environments.
For either LBF or Wolfpack, please ensure to install the environments using the following commands:
cd <Environment Name>/env
pip install -e .
NOTE: We recommend setting two different virtual environments for Wolfpack
and LBF
, respectively, to avoid the potential conflicts between these two experimental environments.
(4) A modified version of OpenAI gym is required.
To do the necessary modifications to gym
, check the directory of the gym
package using
pip show gym
Assuming that the package is installed in <DIR>
, replace <DIR>/gym/vector/async_vector_env.py
with the async_vector_env.py
we have provided. This can be achieved using the following command:
cp async_vector_env.py <DIR>/gym/vector/async_vector_env.py
The training codes of experiments are contained in Wolfpack
and LBF
, respectively. Full description of the hyperparameters and the architecture used in this work is provided in the appendix of our paper.
For either environment, run the following commands to train CIAO-S, CIAO-C and GPL:
cd <Environment Name>/algorithm
Set the following environmental variable:
export OMP_NUM_THREADS=1
(1) Experiments in Section 5.1 and 5.2: Experimental results on identical agent-type sets for training and testing
GPL
python main_mrf.py --graph="complete" --weight_regularizer=0.0 --pair_range="free" --indiv_range="free" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5
CIAO-C
python main_mrf.py --graph="complete" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5
CIAO-S
python main_mrf.py --graph="star" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5
CIAO-C-ZI
python main_mrf.py --graph="complete" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="zero" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5
CIAO-C-NI
python main_mrf.py --graph="complete" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="neg" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5
CIAO-C-FI
python main_mrf.py --graph="complete" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="free" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5
CIAO-C-NP
python main_mrf.py --graph="complete" --weight_regularizer=0.5 --pair_range="neg" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5
CIAO-S-ZI
python main_mrf.py --graph="star" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="zero" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5
CIAO-S-NI
python main_mrf.py --graph="star" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="neg" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5
CIAO-S-FI
python main_mrf.py --graph="star" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="free" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5
CIAO-S-NP
python main_mrf.py --graph="star" --weight_regularizer=0.5 --pair_range="neg" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5
The scripts of all algorithms are the same as above, but only with change of --num_players_test=5
to --num_players_test=9
.
(2) Experiments in Section 5.3: Validating that solving GPL optimization problem is an approximation of Bellman operator in OSB-CAG (with identical agent-type sets in training and testing)
Since the scripts of CIAO-C and CIAO-S are the same as that for the maximum of 5 agents, we only show the scripts of CIAO-S-Va and CIAO-C-Va as follows:
CIAO-C-Va
python main_mrf1.py --graph="complete" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --update_manner="variant"
CIAO-S-Va
python main_mrf1.py --graph="star" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --update_manner="variant"
(3) Experiments in Section 5.4: Generalizability of CIAO with different agent-type sets in training and testing
GPL
python main_mrf1.py --graph="complete" --weight_regularizer=0.0 --pair_range="free" --indiv_range="free" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --intersection_generalization
CIAO-C
python main_mrf1.py --graph="complete" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --intersection_generalization
CIAO-S
python main_mrf1.py --graph="star" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --intersection_generalization
GPL
python main_mrf1.py --graph="complete" --weight_regularizer=0.0 --pair_range="free" --indiv_range="free" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclusion_generalization
CIAO-C
python main_mrf1.py --graph="complete" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclusion_generalization
CIAO-S
python main_mrf1.py --graph="star" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclusion_generalization
(4) Experiments in Appendix I.2 and I.3: Agent-type sets excluding A2C agent (still identical in training and testing and only for LBF)
GPL
python main_mrf.py --graph="complete" --weight_regularizer=0.0 --pair_range="free" --indiv_range="free" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclude_A2Cagent
CIAO-C
python main_mrf.py --graph="complete" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclude_A2Cagent
CIAO-S
python main_mrf.py --graph="star" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclude_A2Cagent
CIAO-C-ZI
python main_mrf.py --graph="complete" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="zero" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclude_A2Cagent
CIAO-C-NI
python main_mrf.py --graph="complete" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="neg" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclude_A2Cagent
CIAO-C-FI
python main_mrf.py --graph="complete" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="free" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclude_A2Cagent
CIAO-C-NP
python main_mrf.py --graph="complete" --weight_regularizer=0.5 --pair_range="neg" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclude_A2Cagent
CIAO-S-ZI
python main_mrf.py --graph="star" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="zero" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclude_A2Cagent
CIAO-S-NI
python main_mrf.py --graph="star" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="neg" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclude_A2Cagent
CIAO-S-FI
python main_mrf.py --graph="star" --weight_regularizer=0.5 --pair_range="pos" --indiv_range="free" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclude_A2Cagent
CIAO-S-NP
python main_mrf.py --graph="star" --weight_regularizer=0.5 --pair_range="neg" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclude_A2Cagent
The scripts of all algorithms are the same as above, but only with change of --num_players_test=5
to --num_players_test=9
.
(5) Experiments in Appendix I.4: CIAO with no regularizers (agent-type sets being identical in training and testing)
Since the scripts of CIAO-C and CIAO-S are the same as that for experiments in Section 5.1 and 5.2 above, we only show the scripts of CIAO-S-NR and CIAO-C-NR as follows:
CIAO-C-NR
python main_mrf.py --graph="complete" --weight_regularizer=0.0 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5
CIAO-S-NR
python main_mrf.py --graph="star" --weight_regularizer=0.0 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5
CIAO-C-NR
python main_mrf.py --graph="complete" --weight_regularizer=0.0 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=9
CIAO-S-NR
python main_mrf.py --graph="star" --weight_regularizer=0.0 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=9
CIAO-C-NR
python main_mrf.py --graph="complete" --weight_regularizer=0.0 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclude_A2Cagent
CIAO-S-NR
python main_mrf.py --graph="star" --weight_regularizer=0.0 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=5 --exclude_A2Cagent
CIAO-C-NR
python main_mrf.py --graph="complete" --weight_regularizer=0.0 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=9 --exclude_A2Cagent
CIAO-S-NR
python main_mrf.py --graph="star" --weight_regularizer=0.0 --pair_range="pos" --indiv_range="pos" --note <NOTE> --save_dir=<SAVE_DIR> --num_players_test=9 --exclude_A2Cagent
Aside from training models, the shell script also periodically checkpoints the model and evaluates it in the training and evaluation environment. We specifically run several episodes under the evaluation setup and log the resulting performance using tensorboard. The resulting logs can be viewed using the following command :
tensorboard --logdir=<Environment Name>/algorithm/runs
If you would like to use the result of this paper, please cite the following paper:
@article{wang2024open,
title={Open Ad Hoc Teamwork with Cooperative Game Theory},
author={Wang, Jianhong and Li, Yang and Zhang, Yuan and Pan, Wei and Kaski, Samuel},
journal={arXiv preprint arXiv:2402.15259},
year={2024}
}
If you have any queries about this paper, please drop an email to jianhong.wang@manchester.ac.uk.