Python implementation of Tabular RL Algorithms in Sutton & Barto 2017 (Reinforcement Learning: An Introduction) Using only NumPy & basic Python data structures (list, tuple, set, and dictionary) to create environment & create algorithms
- Introduction to gridworld environment
- Policy Evaluation and improvement
- Policy Iteration
- Value Iteration
- Monte Carlo Prediction
- Monte Carlo Exploring Starts
- On Policy Monte Carlo
- Off Policy Monte Carlo
- TD Prediction
- SARSA - On-policy Control
- Q-learning - Off-policy Control
- Double Q-learning - Off-policy Control
- n-step TD Prediction
- n-step SARSA - On-policy Control
- n-step Off-policy learning by Importance Sampling
- n-step Off-policy learning without Importance Sampling