An implementation of various solutions for a simplified version of blackjack, as described in 'Reinforcement Learning: An Introduction (Richard S. Sutton, Andrew G. Barto)'
Solutions implemented:
- Monte Carlo with ES (Exploring Starts)
- On-policy first-visit Monte Carlo control (for epsilon-soft policies)
- Off-policy Monte Carlo control
All solutions converge to the optimal policy shown below,