Demo of how "mcts as regularized policy optimization" works
Primary LanguagePython
demo.py: demonstrate of how "mcts as regularized policy optimization" works