laphisboy/RL_Summer

CS234 Study

purely theory study done over the summer based on 2019 stanford cs234 lecture and lecture notes

Lec 1~2 : Basics for RL - RL characteristics, Markov, Policy/Value Iteration

Lec 3~4 : Monte-Carlo, TD, SARSA, Q-learning

Lec 5~7 : Functional Approximation, DQN, Double, Dueling, PER, Imitation Learning

Lec 8~10 : Trying to understand Policy Gradient Algorithm

Lec 11~13 : Fast RL... Multi-Armed Bandit, Regret, Upper Confidence Bound, Hoeffding's Inequality Probably Approximately Correct

...further study required...

Lec 15 : Batch RL and Importance Sampling

...further study required...

references:

Standford lectures provided on youtube
and lecture notes given on Stanford website
https://www.youtube.com/watch?v=FgzM3zpZ55o&list=PLoROMvodv4rOSOPzutgyCTapiGlY2Nd8u
http://web.stanford.edu/class/cs234/CS234Win2019/schedule.html
and Textbook Sutton, Richard S, and Andrew Barto. Reinforcement Learning: an Introduction. The MIT Press, 2018.