This project was a programming assignment for the course Multi Agent Systems at VU Amsterdam to get to know the basics of Reinforcement Learning. In this implementation, the agent has two methods to learn an optimal policy, either using Q-Learning or the SARSA algorithm. There are also functions to visualize the state values and the learnt policy for easier comparison between the two algorithms. The training makes use of learning rate decay to have a good balance of exploration-exploitation.