/galvanise_zero

Learning from zero (mostly based off of AlphaZero) in General Game Playing.

Primary LanguagePythonMIT LicenseMIT

What?

galvanise is a General Game Player, where games are written in GDL. The original galvanise code was converted to a library ggplib and galvanise_zero adds AlphaZero style learning. Much inspiration was from Deepmind’s related papers, and the excellent Expert Iteration paper. A number of Alpha*Zero open source projects were also inspirational: LeelaZero and KataGo (XXX add links).

Features

  • there is no game specific code other than the GDL description of the games, a high level python configuration file describing GDL symbols to state mapping and symmetries (see here for more information).
  • multiple policies - train assymetric games
  • fully automated, put in oven and strong model is baked
  • network replaced during self play games
  • training is very fast using proper coroutines at the C level. 1000s of concurrent games are trained using large batch sizes on GPU (for small networks).
  • uses a post processed replay buffer, which uses the excellent bcolz (XXX link) project. Training can allow arbitrary sampling from the buffer (giving emphasis to most recent data).
  • initially project used expert iteration. This was deprecated in favour of oscillating sampling (similar to KataGo).
  • 3 value heads for games with draws

Training

  • used same setting for training all games types (cpuct 0.85, fpu 0.25).
  • uses smaller number of evaluations (200) than A0, oscillating sampling during training (75% of moves are skipped, using much less evals to do so).
  • policy squashing and extra noise to prevent overfitting
  • models use dropout, global average pooling and squeeze excite blocks (these are optional)
  • in general, takes 3-5 days in many of the trained game types below to become super human strength

See gzero_bot for how to play on Little Golem.

Status

Games with significant training, links to elo graphs and models:

Little Golem Champion in last attempts @ Connect6, Hex13, Amazons and Breakthrough, winning all matches. Retired from further Championships. Connect6 and Hex 13 are currently rated 1st and 2nd respectively on active users.

Amazons and Breakthrough won gold medals at ICGA 2018 Computer Olympiad. 👏 👏

Reversi is also strong relative to humans on LG, yet performs a bit worse than top AB programs (about ntest level 20 the last time I tested).

Also trained Baduk 9x9, it had a rating ~2900 elo on CGOS after 2-3 week of training.

Running

The code is in fairly good shape, but could do with some refactoring and documentation (especially a how to guide on how to train a game). It would definitely be good to have an extra pair of eyes on it. I’ll welcome and support anyone willing to try training a game for themselves. Some notes:

  1. python is 2.7
  2. requires a GPU/tensorflow
  3. good starting point is https://github.com/richemslie/ggp-zero/blob/dev/src/ggpzero/defs

How to run and install instruction coming soon!