/AMI

Primary LanguagePythonApache License 2.0Apache-2.0

Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial Minority Influence

This paper proposed Adversarial Minority Influence (AMI), the first practical black-box adversarial policy attack against c-MARL. Technically, the adversary achieves strong attack capability by taking physically realistic actions to unilaterally fool other cooperative victims into worst-case cooperation.

Framework

To maximize the adversarial effect from adversaries to victims, the Unilateral Influence Filter decomposes mutual information into majority influence and minority influence terms, and then our attack omits the detrimental majority influence term and encourages the adversary to better affect victims by minority influence.

To maximally reduce the cooperation among victims, the Targeted Adversarial Oracle generates worst-case target actions for victims at each timestep by cooperatively learning and adapting with the adversarial policy, such that adversaries attacking victims towards these goals lead to jointly worst cooperation.

Environment

StarCraftII Multi-Agent Challenge (SMAC) environment is a widely used discrete-action environment for evaluating the effectiveness of multi-agent reinforcement learning.

To fairly evaluate attacks in SMAC environment, we add an adversarial agent to the side of victims, train all victims together, and use an adversarial policy to attack that added agent. New maps are available in smac/smac/env/starcraft2/maps/SMAC_Maps.

To comprehensively evaluate the attack performance of the adversary, we add three reward terms: (1)hit-point damage dealt to allies (2)ally units killed (3)losing the game. The original version and our modified version are in src/config/envs/sc2.yaml and src/config/envs/sc2_policy.yaml, respectively.

Multi-agent Mujoco (MAMujoco) environment is a multi-agent variant of MuJoCo environment, such that robots learn to control their joints to move faster in $+x$ direction.

In MAMujoco, the original robot was separated into independent parts, each part controlling different number of joints. The reward of MAMujoco follows Mujoco, which contains two terms, reward_run and reward_ctrl. reward_run specifies the reward for robots achieving high velocity in $+x$ direction; while reward_ctrl penalizes the robot for having large action vector magnitude.

We define adversary reward as the negative of reward_run, since reward_ctrl is an auxiliary reward used in normal training and does not reflect the goal of adversary.

Installation

  1. Install SMAC

    cd ami_smac/smac/
    pip install -e .
  2. Install packages

    pip install -r requirements.txt
  3. Install StarCraft II

    sh install_sc2.sh

As for the installation of MuJoCo, please refer to the guide of mujoco-py.

Instructions of SMAC

cd ami_smac/

Training Victims

python -u src/main.py --config=mappo --env-config=sc2

Running Policy-based Attacks

AMI
python -u src/main.py --config=mappo_ami --env-config=sc2_policy
python -u src/main.py --config=mappo_iclr --env-config=sc2_policy
python -u src/main.py --config=mappo_icml --env-config=sc2_policy
python -u src/main.py --config=mappo_usenix --env-config=sc2_policy
python -u src/main.py --config=mappo_gma --env-config=sc2_policy

Running Observation-based Attacks

AMI-driven
python -u src/main.py --config=mappo_ami_state --env-config=sc2_policy
python -u src/main.py --config=mappo_state --env-config=sc2_policy

results will be stored in ami_smac/src/results/, include:

  • results/models
  • results/tb_logs

Instructions of Multi-Agent MuJoCo

cd ami_mujoco/scripts/

Training Victims

sh train_mujoco.sh

Running Attacks

# adv_algo controls the attack algorithm. 
# Choices: mappo_iclr, mappo_icml, mappo_usenix, mappo_ami, mappo_fgsm, mappo_gma
sh adv_mujoco.sh

results will be stored in ami_mujoco/scripts/results/.

Video

Videos are available in video/ folder.