help: Regarding support issues for multi-agent environments
Closed this issue · 3 comments
Hello, I am very interested in your work. I want to try using it in my experiment. But I don't know how to make it support multi-agent environments. I am a beginner, so my expression may not be accurate. Please forgive me.
In my environment, each step
will return observations and rewards from multiple agents. The number of intelligent agents is fixed during the training process, but can be changed arbitrarily during testing. It seems that it can be seen as returning multiple batches within a step
.
So how should I modify the code to perfectly integrate with your provided TransformerXL
if the agents share the same policy, then it would be all bout adding this extra dimension about number of agents
. I'd suggest to find a multi-agent PPO baseline to carefully check the differences to this single agent environment.
I also prepared an implementation of this repository as contribution to cleanrl. Maybe this is a helpful source towards your goals as well.
https://github.com/MarcoMeter/cleanrl-ppo-trxl/tree/master/cleanrl/ppo_trxl
if the agents share the same policy, then it would be all bout adding this extra dimension about
number of agents
. I'd suggest to find a multi-agent PPO baseline to carefully check the differences to this single agent environment.I also prepared an implementation of this repository as contribution to cleanrl. Maybe this is a helpful source towards your goals as well. https://github.com/MarcoMeter/cleanrl-ppo-trxl/tree/master/cleanrl/ppo_trxl
Thank you very much for your suggestion. I have started learning mappo
according to your suggestion. However, I still really want to ask, are you interested in implementing an example of mappo-trxl
in the future? After all, you know ppo-trxl
the best to write more perfect code.
Sorry, I don't have the capacity to implement multi-agent code.