Asynchronous Advantage Actor-Critic with Communication

The source-code used on the paper Multi-Agent Reinforcement Deep Learning with Emergent Communication, published on IJCNN'19. The paper describes the A3C2 algorithm, for multi-agent learning, with communication.

Contains 4 environments (Hidden Reward, Navigation, Pursuit, Traffic Intersection), and scripts to launch A3C2 and learn policies. Use the requirements.txt to install your dependencies and run the scripts.

Video is available here.

Each agent is defined by 3 networks.

The algorithm is distributed, and multiple workers update the networks.

Gradients are pushed across multiple time-steps to optimize the communication network and enforce communication.

david-simoes-93/A3C2

Asynchronous Advantage Actor-Critic with Communication