This repo contains an extension of the MADDPG algorithm and simulator which is a combination of particle-env and OpenAI Gym Car.
The MADDPG agents can have complex rules, which make them unable to cooperate with novel partners. My solution is to extend it with Empowerment, a information theoretic notion, giving agents the ability to be in control.
pip install -e .
All training code is contained within main.py
. To view options simply run:
python main.py --help
If you want to checkout the training loss on tensorboard, activate the VE and use:
tensorboard --logdir models/model_name
The moving agent needs to go to a landmark with a particular color. However, it is blind and another agent sends messages that help to navigate. Since there are more landmarks than communication channels, the speaking agent cannot simply output a symbol corresponding to a particular color. If the listening agent is not receptive to the messages, the speaker will output random signals. This in turn forces the listener to ignore them. With empowerment agents remain reactive to one another.
DDPG | MADDPG | EMADDPG |
---|---|---|
python simple_speaker_listener3 maddpg+ve --recurrent --variational_transfer_empowerment
In this simple task agents need to cover all landmarks. MADDPG algorithm is trained by self-play, causing them to agree upon a rule. For example, agent 1 goes to the red, agent 2 goes to the green and agent 3 to the blue landmark. At test time, agent 1 is paired with agent 2 and 3 from a different run and so the former rule does not necessarily results in the most efficient landmark selection. In contrast, EMADDPG uses empowerment that results in picking a landmark closest to each agent.
MADDPG | EMADDPG |
---|---|
python main.py maddpg+ve --recurrent --variational_joint_empowerment
Cars need to stay on the road and need to avoid collisions. Agents only obtain a small top view image and their own states, such as orientation and velocity.
Visual inputs:
Red Agent | Green Agent |
---|---|
DDPG | MADDPG | |
---|---|---|
Overtaking | ||
Obstacle avoidance | ||
Junctions |
Agent | Average dist. | Collisions % |
---|---|---|
MADDPG | 1.767 | 20.9 |
EMADDPG | 0.180 | 2.01 |
The average distance of a landmark (lower is better) and number of collisions between agents.
Agent | Taget reach % | Average distance | Obstacle hits % |
---|---|---|---|
MADDPG | 84.0 | 2.233 | 53.5 |
EMADDPG | 98.8 | 0.012 | 1.90 |
The target is reached if it has <.1 from the target landmark (higher is better).