This repository contains two examples in which we use Alchemist as simulator in a context of multi agent deep reinforcement learning.
More details about Alchemist, its execution model, its main abstractions could be found in the primer repository and in the official website.
Each simulation is described in src/main/yaml/.
Configurations that start with training
are the one in which homogenous Deep Q Learning is deployed.
Simulation that start with test
are the one in which the agent directly use the learned Q table to perform their action.
The collective movement is performed at each simulated time unit. The progression is collectively made.
Both the scenario use the same technique for training: a central deep q learner gather the experience and improve the
shared Q table with them. At the end, the best Q table is shared with the whole collective.
In this scenario (described in testCohesion.yml and trainingCohesion.yml ) a group of 20 agent learn how to stay close with each other without colliding. Each agent has a fixed neighborhood (the four closest). The state of each agent consist of the relative distance to the neighborhood. The action that each agent could perform consist in 8 possible direction (north, south, east, ...)
This is the performance at the beginning of the training phase:
In the rest of the simulations, there is a combination between aggregate computing and deep reinforcement learning. In particular, in this case aggregate computing is used to share the information of the leader in the system. Then, Q learning is used to learn how to follow it.
This is the performance at the beginning of the training phase: