purely theory study done over the summer based on 2019 stanford cs234 lecture and lecture notes
Lec 11~13 : Fast RL... Multi-Armed Bandit, Regret, Upper Confidence Bound, Hoeffding's Inequality Probably Approximately Correct
...further study required...
...further study required...
-
Standford lectures provided on youtube
and lecture notes given on Stanford website
https://www.youtube.com/watch?v=FgzM3zpZ55o&list=PLoROMvodv4rOSOPzutgyCTapiGlY2Nd8u
http://web.stanford.edu/class/cs234/CS234Win2019/schedule.html -
and Textbook Sutton, Richard S, and Andrew Barto. Reinforcement Learning: an Introduction. The MIT Press, 2018.