DHDev0/Stochastic-muzero

Stochastic MuZero for Simultaneous-Move Games

Zachary-Fernandes opened this issue · 2 comments

Hello,

I have been thinking of training different artificial intelligence algorithms for use in Pokémon Showdown, which is a game with simultaneous moves and imperfect information. The package I would use - poke-env - can expose an OpenAI Gym wrapper, which is what makes me think it should be possible to use it. The agent would use self-play on a local Showdown server to train and then ideally be evaluated by challenging opponents on the main Showdown server.

I wanted to ask some questions before I started experimenting. First, Pokémon is a simultaneous-move game, and I understand this is a departure from sequential-move games the original AlphaZero model worked on like Go. Does Stochastic MuZero in its current state support training on simultaneous-move games through self-play?

Second, this would be a new environment used through Gym, so I would hope it is simple to add the environment to this package. What advice would you give for adding the environment and/or tuning the hyperparameters? Thank you in advance.

I will close this issue, let me know if you have other question.