This is the code for implementing the meta-MADDPG algorithm presented in the paper: Improving Scalability in Applying Reinforcement Learning into Multi-robot Scenarios. It is configured to be run in conjunction with environments from the Multi-Agent Particle Environments (MPE).
Paper : [Improving Scalability in Applying Reinforcement Learning into Multi-robot Scenarios].
Environment : multiagent-particle-envs. (Training and testing is based on an instance of the environment named "simple_tag_non_adv_4.py").
- Build MPE environment.
# goto the path of multiagent-particle-envs
cd multiagent-particle-envs
# build MPE
python setup.py install
# (optional) if you change the code under the path of MPE, you can rebuild it, or delete it
rm -rf build
pip uninstall multiagent
python setup.py install
-
Execute the main program and train a model of 4 agents or 5 agents
Note 1: :You need to pay special attention to the file paths in your code and adjust the different execution modes as needed.
# Moreover, you can change the running mode through changing the code of # activate_meta_actor = True # initial_train = False # test_initial = False python main_4_non_meta.py
-
Training the model of meta actor and meta critic
Note 1: :You need to pay special attention to the file paths in your code and adjust the different execution modes as needed.
Note 2: :According to the design needs, our code contains two modes of meta, one of which has a rnn structure, anyway, no.
python meta_actor.py # or python meta_actor_rnn.py python meta_critic.py # or python meta_critic_rnn.py make
-
Evaluate the meta model and make a figure
# On the premise of the completion of the training, we cancel the random action process, # run the actor model of each agent, and obtain the specific execution result. python test_meta_actor.py # Evaluate the mode of each mode, the statistical results mainly include the number of collisions and the shortest distance ratio: python evaluate.py # we can output the figure of results finally. python print_figure.py
five green spots are agents, black spots are obstacles, blue spots are targets, and gray for newcomer. Meta-application: when newcomers are into the environment, the meta-actor network in the cloud can be downloaded to the newcomers to take emergent and suitable actions directly.