Multi-Agent Reinforcement Learning for Cooperative Coordination

This repo contains an extension of the MADDPG algorithm and simulator which is a combination of particle-env and OpenAI Gym Car.

The MADDPG agents can have complex rules, which make them unable to cooperate with novel partners. My solution is to extend it with Empowerment, a information theoretic notion, giving agents the ability to be in control.

Requirements

pip install -e .

How to Run

All training code is contained within main.py. To view options simply run:

python main.py --help

If you want to checkout the training loss on tensorboard, activate the VE and use:

tensorboard --logdir models/model_name

Simulation Videos

Cooperative Communication

The moving agent needs to go to a landmark with a particular color. However, it is blind and another agent sends messages that help to navigate. Since there are more landmarks than communication channels, the speaking agent cannot simply output a symbol corresponding to a particular color. If the listening agent is not receptive to the messages, the speaker will output random signals. This in turn forces the listener to ignore them. With empowerment agents remain reactive to one another.

DDPG MADDPG EMADDPG
python simple_speaker_listener3 maddpg+ve --recurrent --variational_transfer_empowerment

Cooperative Coordination

In this simple task agents need to cover all landmarks. MADDPG algorithm is trained by self-play, causing them to agree upon a rule. For example, agent 1 goes to the red, agent 2 goes to the green and agent 3 to the blue landmark. At test time, agent 1 is paired with agent 2 and 3 from a different run and so the former rule does not necessarily results in the most efficient landmark selection. In contrast, EMADDPG uses empowerment that results in picking a landmark closest to each agent.

MADDPG EMADDPG
python main.py maddpg+ve --recurrent --variational_joint_empowerment

Cooperative Driving

Cars need to stay on the road and need to avoid collisions. Agents only obtain a small top view image and their own states, such as orientation and velocity.

Visual inputs:

Red Agent Green Agent
DDPG MADDPG
Overtaking
Obstacle avoidance
Junctions

Cooperative Coordination

Agent Average dist. Collisions %
MADDPG 1.767 20.9
EMADDPG 0.180 2.01

The average distance of a landmark (lower is better) and number of collisions between agents.

Cooperative Communication

Agent Taget reach % Average distance Obstacle hits %
MADDPG 84.0 2.233 53.5
EMADDPG 98.8 0.012 1.90

The target is reached if it has <.1 from the target landmark (higher is better).