Tic Tac Toe , Reinforcement Learning algorithms This project is practice RL methods. There are tons of space for improvement. (eg, rotating/miroring board status) but I didn't do it due to simplicity of understanding the algorithms.
Still I'm not confident with Q Neural Network.
- what is the best MLP topology? leaky_relu or sigmoid? How many layers and neurons of hidden layer?
- what is the best board input?
- what is the best Epsilon and Alpha
I realized DQN is assumed that the most strongest A.I. but still it has so many hyper parameters to implement in reality.
board is 9 integer list, X=1,O=-1,EMPTY=0 eg [0,1,1,-1,0,0,1,1,1]
Just place any available position.
Slightly improved random algorithm. It searches and places win hand if any.
- Simulates n times game with alpha-random policy games for each available pos.
- calculates the win cases of each available pos, and picks the most won hand.
If N is more than 150, it is almost perfect brain.
Q(s,a)= previous Q(s,a) + alpha (reward+gamma*maxQ(s',a') ) Reward:WIN=1, LOSE=-1, DRAW=0 , and plyaing hand reward is 0
It is perfect gamer... Human never win.
MLP:9-162-162-9 , leaky_relu Input:board Output:scores of each action Training: update is same as Q-Learning. only amend the parameter of action index.
This link is very useful. http://outlace.com/Reinforcement-Learning-Part-3/