Overview
Reinforcement Learning algorithms implementations.
Table of contents
- Multi-armed Bandits
- Bandits environment implementation (k-armed w/ optional non-stationarity)
- ε-greedy policy
- Upper Confidence Bound policy
- Policy gradient
- Dynamic Programming
- Policy Iteration (Policy Evaluation + Policy Improvement)
- Value Iteration
- Monte-Carlo
- MC on-policy value function estimation
- MC on-policy first-visit ε-greedy
- MC off-policy every-visit w/ weighted important sampling
- Temporal Difference
- SARSA
- Q-Learning
- Expected SARSA
- Double Q-Learning
- n-step Bootstrapping (TODO)
- Planning and Learning
- Maze environment implementation
- Dyna-Q
- Dyna-Q w/ prioritized sweeping
- Dyna-Q+
Prerequisites
- Conda
Installation
Resources
Authors
- Nassim Habbash - nhabbash