This is the code for implementing the intention reading and generalization experiments from the paper
Do deep reinforcement learning agents model intentions?.
It is using the simple_spread
environment from the
Multi-Agent Particle Environments (MPE).
-
To install,
cd
into the root directory and typepip install -e .
-
Known dependencies: OpenAI gym, tensorflow, numpy, also scikit-learn and matplotlib for plotting.
-
Download and install the MPE code here by following the
README
. -
To run the code,
cd
into theexperiments
directory and run:- for basic MADDPG agents:
./experiment.sh coop_navi_0
- for MADDPG + shared scheme, all agents use shared model:
./experiment.sh coop_navi_shared_0 --shared
- for MADDPG + shuffle scheme, agents are shuffled for each episode:
./experiment.sh coop_navi_shuffle_episode_0 --shuffle episode
- for MADDPG + ensemble scheme, agents are sampled for each episode:
./experiment_ensemble.sh coop_navi_ensemble_episode_0 --ensemble-choice episode
- for basic MADDPG agents:
train.py
- basic training script, also used for evaluationensemble.py
- ensemble training script, also used for evaluationlearning_curve.py
- plots learning curve of an experimentstatistics.py
- collects basic benchmark data from evaluationprepare.py
- simplifies evaluation data for further processingprepare_ensemble.py
- simplifies evaluation data for further processing, for ensemble resultsaccuracy.py
- calculates per-timestep target prediction accuraciesfigure.py
- plots target prediction accuracies for all agentssheldon.py
- runs evaluation against Sheldon agents (agents with fixed targets)sheldon_ensemble.py
- runs evaluation against Sheldon agents, for ensemble results
For usage details refer to experiment.sh
, experiment_ensemble.sh
and individual files.
If you used this code for your experiments or found it helpful, consider citing the following paper:
@article{matiisen2018do, title={Do deep reinforcement learning agents model intentions?}, author={Matiisen, Tambet and Labash, Aqeel and Majoral, Daniel and Aru, Jaan and Vicente, Raul}, journal={arXiv preprint arXiv:1805.06020}, year={2018} }
Thanks to OpenAI for the original paper and for releasing the code.