This project compares the SSE and steps to goal curve of three temporal difference learning algorithms on a 3x3 grid world.
The algorithms evaluated are:
- Q-Learning
- Double Q-Learning
- SARSA(λ)
A comparison of three TD learning algorithms on a 3x3 grid world
Jupyter Notebook