Multiple policies in MOA training

Question

Multiple policies in MOA training

Closed this issue 4 years ago · 2 comments

As I understand it, in the initial experiments in the paper only a limited number of agents are trained with the MOA/causal influence reward. In the implementation (train_moa.py), it looks all agents are equipped with the MOA model and receive a causal influence reward. It isn't immediately clear to me how to alter this to allow for variation in agent policies/models, since the Trainers postprocess and incorporate the causal rewards. Does anyone have any insight or suggestions into how this might be done?

Answer 1 · 2020-06-25T07:59:19.000Z

The basic social influence experiment (number 1) has not been implemented in this repository. Only Experiment III: Modeling Other Agents is present, next to the baseline A3C (and PPO) model.

Answer 2 · 2020-06-25T16:37:13.000Z

Right, thanks for the clarification.