eugenevinitsky/sequential_social_dilemma_games

Agents condition their policy on other agents' actions

Closed this issue · 4 comments

Agents need to be able to choose their next action based on another agent's past action. This way we can assess the causal influence of one agent's action on another. This is necessary before we can do issue #13.

So, what does this actually entail? It should be part of their state space?

Yeah, other agents' actions should be encoded one-hot, and input to each agent in the next timestep. I guess it would be part of their observation space. I was thinking of trying to do this within rllib, but let me know if you think it makes more sense to do within the environment somehow.

I think I'm a little confused; if it isn't built in as part of their observation space then won't the neural network not have the right number of inputs? Like, how would you input it in later?

Moving this to other repo.