Agents condition their policy on other agents' actions

Question

Agents condition their policy on other agents' actions

Closed this issue 6 years ago · 4 comments

Agents need to be able to choose their next action based on another agent's past action. This way we can assess the causal influence of one agent's action on another. This is necessary before we can do issue #13.

Answer 1 · 2019-01-15T21:45:18.000Z

So, what does this actually entail? It should be part of their state space?

Answer 2 · 2019-01-17T02:16:14.000Z

Yeah, other agents' actions should be encoded one-hot, and input to each agent in the next timestep. I guess it would be part of their observation space. I was thinking of trying to do this within rllib, but let me know if you think it makes more sense to do within the environment somehow.

Answer 3 · 2019-01-17T02:24:26.000Z

I think I'm a little confused; if it isn't built in as part of their observation space then won't the neural network not have the right number of inputs? Like, how would you input it in later?

Answer 4 · 2019-01-28T00:30:40.000Z

Moving this to other repo.