/muzero_general

Primary LanguagePythonMIT LicenseMIT

supported platforms supported python versions dependencies status style black license MIT discord badge

MuZero General

A commented and documented implementation of MuZero based on the Google DeepMind paper (Nov 2019) and the associated pseudocode. It is designed to be easily adaptable for every games or reinforcement learning environments (like gym). You only need to add a game file with the hyperparameters and the game class. Please refer to the documentation and the example.

MuZero is a state of the art RL algorithm for board games (Chess, Go, ...) and Atari games. It is the successor to AlphaZero but without any knowledge of the environment underlying dynamics. MuZero learns a model of the environment and uses an internal representation that contains only the useful information for predicting the reward, value, policy and transitions. MuZero is also close to Value prediction networks. See How it works.

...ORIGINAL README

Mokemokechicken's Experiments Results

Note

This original repository is very nice. However, it has some problems in two player games and needs to be patched. Check all_patch_for_master branch if you try muzero-general.

Results

  • TicTacToe: Almost Perfect Trained.
  • connect4: Good Trained(Always win to expert).
  • animal_shogi: ready to go.