meta-MADDPG

Introduction

This is the code for implementing the meta-MADDPG algorithm presented in the paper: Improving Scalability in Applying Reinforcement Learning into Multi-robot Scenarios. It is configured to be run in conjunction with environments from the Multi-Agent Particle Environments (MPE).

Paper : [Improving Scalability in Applying Reinforcement Learning into Multi-robot Scenarios].

Environment : multiagent-particle-envs. (Training and testing is based on an instance of the environment named "simple_tag_non_adv_4.py").

Dependency

pytorch
visdom
python 2

Install

Build MPE environment.

# goto the path of multiagent-particle-envs
cd multiagent-particle-envs
# build MPE
python setup.py install
# (optional) if you change the code under the path of MPE, you can rebuild it, or delete it
rm -rf build
pip uninstall multiagent
python setup.py install

Execute the main program and train a model of 4 agents or 5 agents

Note 1: :You need to pay special attention to the file paths in your code and adjust the different execution modes as needed.

# Moreover, you can change the running mode through changing the code of
#    activate_meta_actor = True
#    initial_train       = False
#    test_initial        = False
python main_4_non_meta.py

Training the model of meta actor and meta critic

Note 1: :You need to pay special attention to the file paths in your code and adjust the different execution modes as needed.

Note 2: :According to the design needs, our code contains two modes of meta, one of which has a rnn structure, anyway, no.
```
python meta_actor.py   #  or python meta_actor_rnn.py
python meta_critic.py  #  or python meta_critic_rnn.py
make
```

Evaluate the meta model and make a figure

# On the premise of the completion of the training, we cancel the random action process,
# run the actor model of each agent, and obtain the specific execution result.
python test_meta_actor.py
# Evaluate the mode of each mode, the statistical results mainly include the number of collisions and the shortest distance ratio:
python evaluate.py
# we can output the figure of results finally.
python print_figure.py

Result

five green spots are agents, black spots are obstacles, blue spots are targets, and gray for newcomer. Meta-application: when newcomers are into the environment, the meta-actor network in the cloud can be downloaded to the newcomers to take emergent and suitable actions directly.

Four trained agents implementations: ：
idiot newcomer (The fifth agent arrives, and its actor network is idiot):
meta newcomer (The fifth agent arrives, and its acotr network directly loads meta actor network) ：

luxianglin/meta-MADDPG

meta-MADDPG

Introduction

Dependency

Install

Result