DHDev0/Stochastic-muzero

reproducing the result on 2048

Closed this issue · 3 comments

Hello, Daniel, thanks a lot for your contribution. I am trying to reproduce the reported result on 2048 game. But I do not find the environment implementation in the repo. I would appreciate it if you could recommend an open source 2048 env to use in this repo.

simulation: https://pypi.org/project/gym-2048/ ( probably need to upgrade the env to gymnasium 0.27 )
hyperparameter : https://openreview.net/pdf?id=X6D9bAHhBQ1 ( page 20 )
You will also have to instantiate multiple machine for simulation with ray by modifying https://github.com/DHDev0/Stochastic-muzero/blob/main/self_play.py (line 241) for cluster of ray machine ( https://docs.ray.io/en/releases-2.0.0/cluster/getting-started.html )

< In 2048 we used 1 TPU for training and 4 TPUs for acting, for 80 hours per experiment; equivalent to
roughly 8 days on a V100. > https://openreview.net/pdf?id=X6D9bAHhBQ1 page 16

Thanks for your response, Daniel. I will have a try :)

You welcome. Let me know if you need help. ( just open a new issue )