Snake Reinforcement AI Python

Experiment with the Reinforcement tequnice applied to the snake game in a pygame environment

Technologies

Python, Pygame and PyTorch

Q Value = Quality of action 0. Init Q Value (= init model)

Update Q-Value using the Bellman Equation

NewQ(s,a) = Q(s,a) + α[R(s,a) + φ*maxQ'(s',a') - Q(s,a)]

NewQ(s,a).. New Q value for that state and that action
Q(s,a)... Current Q value
α... Learning rate
R(s,a).. Reward for taking that action at that state
φ... Discount rate
maxQ'(s',a')... Maximum exprected future reward given the new s' and all possible actions at that new state

Q = model.predict(state0) Qnew = R + φ * max(Q(state1))

Mean squared error: loss = (Qnew - Q)^2