Snake-AI

A feedforward neural network is used with three hidden layers.The NN employs 𝑅𝑒𝐿𝑢 activation function for its layers. The input layer consists of 11 nodes from the state of the snake, the output layer consists the three action nodes that the snake takes,i.e. the direction it can move in.

The 𝑎𝑐𝑡𝑖𝑜𝑛𝑠 are the choices made by the agent The 𝑠𝑡𝑎𝑡𝑒𝑠 are the basis for making the choices The 𝑟𝑒𝑤𝑎𝑟𝑑𝑠 are the basis for evaluating the choices

Deep Q Algorithm-

Initialise Q value /n
Choose action to be performed, the action selection policy is e-greedy
Perform action(𝐴𝑛) for the time step 𝑛 and measure the award (𝑅𝑛) associated with that action
Update the Q value for the action 𝐴𝑛

Each rule that dicates how actions are done as function of state are called 𝑝𝑜𝑙𝑖𝑐𝑖𝑒𝑠. Each policy has a 𝑣𝑎𝑙𝑢𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 which associates every action-state pair to an expected return, if that state-action pair is performed. An 𝑜𝑝𝑡𝑖𝑚𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 assigns the largest expected return to each state, or state-action pair. We will be using the Bellman optimality function here to derive these optimal value functions

The 11 states that we will use are [direction left, direction right, direction up, direction down], [food up, food down, food right, food left], [danger straight,danger right, danger left]. The moves are choosen by an 𝑑𝑒𝑐𝑎𝑦𝑖𝑛𝑔 𝑒𝑝𝑠𝑖𝑙𝑜𝑛 𝑔𝑟𝑒𝑒𝑑𝑦 algorithm.

Vansh404/Snake-AI

Snake-AI