MarcoMeter/episodic-transformer-memory-ppo

help: Regarding support issues for multi-agent environments

Closed this issue · 3 comments

Hello, I am very interested in your work. I want to try using it in my experiment. But I don't know how to make it support multi-agent environments. I am a beginner, so my expression may not be accurate. Please forgive me.

In my environment, each step will return observations and rewards from multiple agents. The number of intelligent agents is fixed during the training process, but can be changed arbitrarily during testing. It seems that it can be seen as returning multiple batches within a step.

So how should I modify the code to perfectly integrate with your provided TransformerXL

Hi @liugehaizaixue

if the agents share the same policy, then it would be all bout adding this extra dimension about number of agents. I'd suggest to find a multi-agent PPO baseline to carefully check the differences to this single agent environment.

I also prepared an implementation of this repository as contribution to cleanrl. Maybe this is a helpful source towards your goals as well.
https://github.com/MarcoMeter/cleanrl-ppo-trxl/tree/master/cleanrl/ppo_trxl

Hi @liugehaizaixue

if the agents share the same policy, then it would be all bout adding this extra dimension about number of agents. I'd suggest to find a multi-agent PPO baseline to carefully check the differences to this single agent environment.

I also prepared an implementation of this repository as contribution to cleanrl. Maybe this is a helpful source towards your goals as well. https://github.com/MarcoMeter/cleanrl-ppo-trxl/tree/master/cleanrl/ppo_trxl

Thank you very much for your suggestion. I have started learning mappo according to your suggestion. However, I still really want to ask, are you interested in implementing an example of mappo-trxl in the future? After all, you know ppo-trxl the best to write more perfect code.

Sorry, I don't have the capacity to implement multi-agent code.