/Snake-AI

Implementing Deep Q Learning on the classic Snake game.

Primary LanguagePython

Snake-AI

A feedforward neural network is used with three hidden layers.The NN employs 𝑅𝑒𝐿𝑒 activation function for its layers. The input layer consists of 11 nodes from the state of the snake, the output layer consists the three action nodes that the snake takes,i.e. the direction it can move in.

The π‘Žπ‘π‘‘π‘–π‘œπ‘›π‘  are the choices made by the agent The π‘ π‘‘π‘Žπ‘‘π‘’π‘  are the basis for making the choices The π‘Ÿπ‘’π‘€π‘Žπ‘Ÿπ‘‘π‘  are the basis for evaluating the choices

Deep Q Algorithm-

  1. Initialise Q value /n
  2. Choose action to be performed, the action selection policy is e-greedy
  3. Perform action(𝐴𝑛) for the time step 𝑛 and measure the award (𝑅𝑛) associated with that action
  4. Update the Q value for the action 𝐴𝑛

Each rule that dicates how actions are done as function of state are called π‘π‘œπ‘™π‘–π‘π‘–π‘’π‘ . Each policy has a π‘£π‘Žπ‘™π‘’π‘’ π‘“π‘’π‘›π‘π‘‘π‘–π‘œπ‘› which associates every action-state pair to an expected return, if that state-action pair is performed. An π‘œπ‘π‘‘π‘–π‘šπ‘Žπ‘™ π‘£π‘Žπ‘™π‘’π‘’ π‘“π‘’π‘›π‘π‘‘π‘–π‘œπ‘› assigns the largest expected return to each state, or state-action pair. We will be using the Bellman optimality function here to derive these optimal value functions

The 11 states that we will use are [direction left, direction right, direction up, direction down], [food up, food down, food right, food left], [danger straight,danger right, danger left]. The moves are choosen by an π‘‘π‘’π‘π‘Žπ‘¦π‘–π‘›π‘” π‘’π‘π‘ π‘–π‘™π‘œπ‘› π‘”π‘Ÿπ‘’π‘’π‘‘π‘¦ algorithm.

Screenshot 2022-07-15 013633