/Multi-Agent-Coordination-Google-Football

Coordination between Deep RL Agents for Virtual Football

Primary LanguagePythonApache License 2.0Apache-2.0

Coordination between Deep RL Agents for Virtual Football

We take a look at various multi-agent reinforcement learning methods and their benefits within the context of the Google Research Football environment. To begin with, some context is required to describe the environment we worked within. The main function of the Google Research Football environment is to provide a simulator for reinforcement learning in an 11v11 football game. The environment, however, does also provide the ability to create custom simulations and has several pre-built scenarios to experiment in. These smaller environments can help with quicker iteration and experimentation due to the reduced compute cost. In our paper, we experimented on a combination of these smaller scenarios provided to us as well as our own custom-built scenarios in order to get a wide range of results to draw conclusions from. We set up several different experiments in order to determine the performance of various methods for multi-agent RL. By default, the environment allows for an agent to control a single player on the field, but through modification we were able to allow for N different agents to control separate players on the field and be trained together. In another experiment, we instead had these N agents controlling the same players but trained them only individually (other players controlled by rule-based bots). Finally, in a third variant we had a single agent jointly controlling all N players and outputting N actions at each iteration. We then applied these experimental setups to the different scenarios we built, taking note of initial results and fine-tuning the algorithm as results came in. From the data obtained, we were able to conclude that controlling N different agents that were trained simultaneously yielded the most stable, high-performance results. While both other methods had similar performance peaks to that of N simultaneous agents, in the case of a single global agent, the algorithm took a long time to stabilize or did not stabilize, and in the case of N agents trained individually, we saw much more variance upon evaluation as compared to the case in which they were trained simultaneously. Therefore, we found that controlling multiple agents and training them simultaneously was the best path going forward and continuing into extended work. With regards to next steps, our primary aim would be to extend the idea of using multiple agents to incorporate defenders as our current setup only trains attackers. Doing so would yield greater variety in states encountered by the agents, resulting in more robust, generalizable agents.

Based on Google Football Environment:https://github.com/google-research/football