purely theory study done over the summer based on 2019 stanford cs234 lecture and lecture notes
Lec 11~13 : Fast RL... Multi-Armed Bandit, Regret, Upper Confidence Bound, Hoeffding's Inequality Probably Approximately Correct
...further study required...
...further study required...
Standford lectures provided on youtube
and lecture notes given on Stanford website
http://web.stanford.edu/class/cs234/CS234Win2019/schedule.html -
and Textbook Sutton, Richard S, and Andrew Barto. Reinforcement Learning: an Introduction. The MIT Press, 2018.