snake_ai

clip of a pretrained model playing snake (2.5M timesteps, 12 parallel Envs trained)

2022-06-03.10-11-07.mp4

This project contains my own implementation of snake in the form of a stable_baselines3 VecEnv, and a training script that trains an agend based on PPO. To get more information on stable_baselines3's PPO, check out https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html

The observation space is a 12 dimensional vector, containing:

relative apple direction
- e.g. [1, 0, 0, 0] -> apple is above
snake head direction
- e.g. [0, 1, 0, 0] -> snake is headed right
is there an obstacle next to the head?
- e.g. [0, 1, 0, 1] there is an obstacle to the left and right of the snake's head

the loss function:

eating an apple: +100
dying: -100
for every step: ((1 / distance(apple, head)) - 0.5) * 10

TODO:

play around with observation space
- (context: In the beginning, the observation space was just the entire frame. The snake didn't seem to be improving a lot, so maybe there are even better observation spaces)
play around with loss function
- e.g. change the distance reward to "punish for going further away, reward for getting closer" instead of this distance formula

Encrux/snake_ai

snake_ai