Multiple policies in MOA training
Closed this issue · 2 comments
As I understand it, in the initial experiments in the paper only a limited number of agents are trained with the MOA/causal influence reward. In the implementation (train_moa.py
), it looks all agents are equipped with the MOA model and receive a causal influence reward. It isn't immediately clear to me how to alter this to allow for variation in agent policies/models, since the Trainers postprocess and incorporate the causal rewards. Does anyone have any insight or suggestions into how this might be done?
The basic social influence experiment (number 1) has not been implemented in this repository. Only Experiment III: Modeling Other Agents is present, next to the baseline A3C (and PPO) model.
Right, thanks for the clarification.