Experiment with the Reinforcement tequnice applied to the snake game in a pygame environment
Python, Pygame and PyTorch
Q Value = Quality of action 0. Init Q Value (= init model)
- Choose action (model.predict(state))
- Perform action
- Measure reward
- Update Q value (+ train model)
- Repeat 1.
Update Q-Value using the Bellman Equation
NewQ(s,a) = Q(s,a) + α[R(s,a) + φ*maxQ'(s',a') - Q(s,a)]
- NewQ(s,a).. New Q value for that state and that action
- Q(s,a)... Current Q value
- α... Learning rate
- R(s,a).. Reward for taking that action at that state
- φ... Discount rate
- maxQ'(s',a')... Maximum exprected future reward given the new s' and all possible actions at that new state
Q = model.predict(state0) Qnew = R + φ * max(Q(state1))
Mean squared error: loss = (Qnew - Q)^2
- eat food: +10
- game over: -10
- else: 0
- [1, 0, 0] -> straight
- [0, 1, 0] -> right turn
- [0, 0, 1] -> left turn
- danger straight / danger right / danger left [0,0,0]
- direction left, direction right, direction up, direction down [0,0,0,0]
- food left, food right, food up, food down [0,0,0,0]