JCK-1096/Bandit-and-Reinforcement-Learning

Python implementation for Reinforcement Learning algorithms -- Bandit algorithms, MDP, Dynamic Programming (value/policy iteration), Model-free Control (off-policy Monte Carlo, Q-learning)

Python

Stargazers

JCK-1096