An open course on reinforcement learning in the wild. Taught on-campus at HSE and YSDA and maintained to be friendly to online students (both english and russian).
Note: this branch is an on-campus version of the for spring 2019 YSDA and HSE students. For full course materials, switch to the master branch.
- Optimize for the curious. For all the materials that aren’t covered in detail there are links to more information and related materials (D.Silver/Sutton/blogs/whatever). Assignments will have bonus sections if you want to dig deeper.
- Practicality first. Everything essential to solving reinforcement learning problems is worth mentioning. We won't shun away from covering tricks and heuristics. For every major idea there should be a lab that makes you to “feel” it on a practical problem.
- Git-course. Know a way to make the course better? Noticed a typo in a formula? Found a useful link? Made the code more readable? Made a version for alternative framework? You're awesome! Pull-request it!
-
Chat room for YSDA & HSE students is here
-
Grading rules for YSDA & HSE students is here
-
FAQ: About the course, Technical issues thread, Lecture Slides, Online Student Survival Guide
-
Anonymous feedback form.
-
Virtual course environment:
- Installing dependencies on your local machine (recommended).
- google colab - set open -> github -> yandexdataschool/pracical_rl -> {branch name} and select any notebook you want.
- Alternatives: and Azure Notebooks.
The syllabus is approximate: the lectures may occur in a slightly different order and some topics may end up taking two weeks.
-
week01_intro Introduction
- Lecture: RL problems around us. Decision processes. Stochastic optimization, Crossentropy method. Parameter space search vs action space search.
- Seminar: Welcome into openai gym. Tabular CEM for Taxi-v0, deep CEM for box2d environments.
- Homework description - see week1/README.md.
-
week02_value_based Value-based methods
- Lecture: Discounted reward MDP. Value-based approach. Value iteration. Policy iteration. Discounted reward fails.
- Seminar: Value iteration.
- Homework description - see week2/README.md.
-
week03_model_free Model-free reinforcement learning
- Lecture: Q-learning. SARSA. Off-policy Vs on-policy algorithms. N-step algorithms. TD(Lambda).
- Seminar: Qlearning Vs SARSA Vs Expected Value SARSA
- Homework description - see week3/README.md.
-
week04 Approximate (deep) RL
-
week05 Exploration
-
week06 Policy Gradient methods
-
week07 Applications I
-
week{++i} Partially Observed MDP
-
week{++i} Advanced policy-based methods
-
week{++i} Applications II
-
week{++i} Distributional reinforcement learning
-
week{++i} Inverse RL and Imitation Learning
Course materials and teaching by: [unordered]
- Pavel Shvechikov - lectures, seminars, hw checkups, reading group
- Nikita Putintsev - seminars, hw checkups, organizing our hot mess
- Alexander Fritsler - lectures, seminars, hw checkups
- Oleg Vasilev - seminars, hw checkups, technical support
- Dmitry Nikulin - tons of fixes, far and wide
- Mikhail Konobeev - seminars, hw checkups
- Ivan Kharitonov - seminars, hw checkups
- Ravil Khisamov - seminars, hw checkups
- Fedor Ratnikov - admin stuff
- Using pictures from Berkeley AI course
- Massively refering to CS294
- Several tensorflow assignments by Scitator
- A lot of fixes from arogozhnikov
- Other awesome people: see github contributors
- Alexey Umnov helped us a lot during spring2018