My solutions to some excercises and implementation of some algorithms from Reinforcement Learning: An Introduction (2018)
All the code is implemented in Jupyter Notebooks
- Bandits (Chapter 2) : The 10 armed Testbed
- Dynamic Programming (Chapter 4): Gambler's Problem (Ex 4.9)
- Monte Carlo Methods (Chapter 5): Racetrack (Ex 5.12)
- TD Learning (Chapter 6): Windy Gridworld (Ex 6.5)
- Planning and Learning with Tabular Methods (Chapter 8): Dyna-Q