/RL-Gomoku

Primary LanguagePython

RL-Gomoku

Implementation of several Reinforcement Learning Algorithms. The codes are optimized to some extent, while ensuring the simpleness & cleanness.

Current Progress

  • [X] Sarsa (Tabular)
  • [X] Q-Learning (Tabular)
  • [X] Monte Carlo Tree Search (UCT)
  • [X] REINFORCE (w/ & w/o Baseline)
  • [X] Actor-Critic
  • [ ] TO BE ADDED

Usage

> python3 main.py

Sample

Random vs Random
-- 1P:  #Win=56, #Lose=29, #Draw=15
-- 2P:  #Win=20, #Lose=67, #Draw=13

Sarsa vs Random
-- 1P:  #Win=98, #Lose=0, #Draw=2
-- 2P:  #Win=60, #Lose=28, #Draw=12

Q-Learning vs Random
-- 1P:  #Win=99, #Lose=0, #Draw=1
-- 2P:  #Win=58, #Lose=34, #Draw=8

REINFORCE vs Random
-- 1P:  #Win=59, #Lose=30, #Draw=11
-- 2P:  #Win=34, #Lose=51, #Draw=15

Actor-Critic vs Random
-- 1P:  #Win=82, #Lose=11, #Draw=7
-- 2P:  #Win=48, #Lose=51, #Draw=1

MCTSPlayer vs Random
-- 1P:  #Win=9, #Lose=0, #Draw=1
-- 2P:  #Win=9, #Lose=0, #Draw=1

Sarsa vs Q-learning
-- 1P:  #Win=100, #Lose=0, #Draw=0
-- 2P:  #Win=0, #Lose=0, #Draw=100

REINFORCE vs Sarsa
-- 1P:  #Win=9, #Lose=90, #Draw=1
-- 2P:  #Win=0, #Lose=100, #Draw=0

REINFORCE vs Q-learning
-- 1P:  #Win=30, #Lose=55, #Draw=15
-- 2P:  #Win=0, #Lose=98, #Draw=2

Actor-Critic vs Sarsa
-- 1P:  #Win=0, #Lose=97, #Draw=3
-- 2P:  #Win=0, #Lose=100, #Draw=0

Actor-Critic vs Q-learning
-- 1P:  #Win=44, #Lose=56, #Draw=0
-- 2P:  #Win=0, #Lose=100, #Draw=0

Actor-Critic vs REINFORCE
-- 1P:  #Win=66, #Lose=22, #Draw=12
-- 2P:  #Win=19, #Lose=81, #Draw=0

MCTSPlayer vs Sarsa
-- 1P:  #Win=6, #Lose=1, #Draw=3
-- 2P:  #Win=0, #Lose=1, #Draw=9

MCTSPlayer vs Q-Learning
-- 1P:  #Win=9, #Lose=0, #Draw=1
-- 2P:  #Win=0, #Lose=0, #Draw=10

MCTSPlayer vs REINFORCE
-- 1P:  #Win=10, #Lose=0, #Draw=0
-- 2P:  #Win=10, #Lose=0, #Draw=0

MCTSPlayer vs Actor-Critic
-- 1P:  #Win=9, #Lose=0, #Draw=1
-- 2P:  #Win=10, #Lose=0, #Draw=0

Requirements

  • Numpy
  • Numba
  • PyTorch(0.4.1)