Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

This is the code for implementing the M3DDPG (mmmaddpg) algorithm. The code is modified from https://github.com/openai/maddpg

For Multi-Agent Particle Environments (MPE) installation, please refer to https://github.com/openai/multiagent-particle-envs

python train.py --scenario simple

--scenario: defines which environment in the MPE is to be used (default: "simple")
--max-episode-len maximum length of each episode for the environment (default: 25)
--num-episodes total number of training episodes (default: 60000)
--num-adversaries: number of adversaries in the environment (default: 0)
--good-policy: algorithm used for the 'good' (non adversary) policies in the environment (default: "maddpg"; options: {"mmmaddpg", "maddpg", "ddpg"})
--adv-policy: algorithm used for the adversary policies in the environment (default: "maddpg"; options: {"mmmaddpg", "maddpg", "ddpg"})

Aakriti05/m3ddpg