Stochastic MuZero for Simultaneous-Move Games
Zachary-Fernandes opened this issue · 2 comments
Hello,
I have been thinking of training different artificial intelligence algorithms for use in Pokémon Showdown, which is a game with simultaneous moves and imperfect information. The package I would use - poke-env - can expose an OpenAI Gym wrapper, which is what makes me think it should be possible to use it. The agent would use self-play on a local Showdown server to train and then ideally be evaluated by challenging opponents on the main Showdown server.
I wanted to ask some questions before I started experimenting. First, Pokémon is a simultaneous-move game, and I understand this is a departure from sequential-move games the original AlphaZero model worked on like Go. Does Stochastic MuZero in its current state support training on simultaneous-move games through self-play?
Second, this would be a new environment used through Gym, so I would hope it is simple to add the environment to this package. What advice would you give for adding the environment and/or tuning the hyperparameters? Thank you in advance.
- Use original hyperparameter of the paper (pseudocode)
- Your simulation is fundamentally combinatorial so you will need to explore, no short cut with hyperparameter tuning.
- Get a lot of cloud compute, i would say 10x that
For Backgammon, we used 1 TPU for training and 16 TPUs for acting, for approximately 27 hours equivalent to 10 days on a single V100 GPU.
( https://openreview.net/pdf?id=X6D9bAHhBQ1 page 15 ) - Building simulation refer to https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/, example https://github.com/DHDev0/Stochastic-time-series-forecast-simulator
I will close this issue, let me know if you have other question.