
I have implemented grid world game by iteratively updating Q value function, which is the estimating value of (state, action) pair. This time let’s look into how to leverage reinforcement learning in adversarial game — tic-tac-toe, where there are more states and actions and most importantly, there is an opponent playing against our agent
• Build an RL agent that learns the game by Q-Learning by choosing the hyperparameters such as epsilon (decay rate), learning-rate, discount factor. We have trained the model iteratively to obtain a good combination of hyperparameters.
• Streamlined the python code for implementing a TIC TAC TOE playing algorithm using epsilon greedy method, Reinforcement learning technique.