An efficient and clean implementation of AlphaZero for single-player domains in PyTorch. The implementation is inspired by the awesome EfficientZero implementation, a derivative work building muZero. Another invaluable resource was the alphazero_singleplayer repository and the corresponding blogpost.
- Worker parallelization using Ray
- Model inference parallelism via Batch MCTS
- AMP support
- A lot of improvements used in muZero like min-max value scaling and discrete value support for intermediate rewards during MCTS
- Model pre-training and training data enrichment through demonstrations (similar to AlphaTensor)
- Easily extendable to new singleplayer environments (just sub-class the BaseConfig)
Run
pip install -r requirements.txt
and
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
To train AlphaZero on CartPole, run:
python main.py --env cartpole --opr train,test