/efficientalphazero

AlphaZero for singleplayer environments implemented efficiently using Ray

Primary LanguagePython

(Efficient) AlphaZero

An efficient and clean implementation of AlphaZero for single-player domains in PyTorch. The implementation is inspired by the awesome EfficientZero implementation, a derivative work building muZero. Another invaluable resource was the alphazero_singleplayer repository and the corresponding blogpost.

Features

  • Worker parallelization using Ray
  • Model inference parallelism via Batch MCTS
  • AMP support
  • A lot of improvements used in muZero like min-max value scaling and discrete value support for intermediate rewards during MCTS
  • Model pre-training and training data enrichment through demonstrations (similar to AlphaTensor)
  • Easily extendable to new singleplayer environments (just sub-class the BaseConfig)

Setup

Run pip install -r requirements.txt and conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia

Example usage

To train AlphaZero on CartPole, run:

python main.py --env cartpole --opr train,test