A course on reinforcement learning in the wild. Taught on-campus at HSE(russian) and maintained to be friendly to online students (both english and russian).
- Optimize for the curious. For all the materials that aren’t covered in detail there are links to more information and related materials (D.Silver/Sutton/blogs/whatever). Assignments will have bonus sections if you want to dig deeper.
- Practicality first. Everything essential to solving reinforcement learning problems is worth mentioning. We won't shun away from covering tricks and heuristics. For every major idea there should be a lab that makes you to “feel” it on a practical problem.
- Git-course. Know a way to make the course better? Noticed a typo in a formula? Found a useful link? Made the code more readable? Made a version for alternative framework? You're awesome! Pull-request it!
- Lecture slides are here.
- Telegram chat room for YSDA & HSE students is here
- Online student survival guide
- Installing the libraries - guide and issues thread
- Magical button that launches you into course environment:
- Anonymous feedback form for everything that didn't go through e-mail.
- About the course
- A large list of RL materials - awesome rl
- Reading group chat room
- Everyone who wants to attend RL reading group ping Pavel Shvechikov -
1xolodec@gmail.com
- 2017.12.29 - HSE track for fall'2017 is offically over. Next is spring'18 @ HSE & YSDA.
- 2017.10.02 - week4 homework is yet to be published, week3 and week4 deadlines are shifted one week into the future.
- 2017.09.24 - Week3 homework published, we're sorry for the delay
- 2017.09.13 - Gym website seems to have gone down indefinitely. Therefore,
- week0 homework: Bonus I counts as 2 points if you beat mean reward +5.0 for Taxi-v1 or +0.95 on frozenlake8x8
- week1 homework: Instead of 1 point for task 2.2 and 3 points for 2.3 you get 4 points for 2.3.
- Since you can't submit, just ignore and instructions to do so. We'll push them this weekend to avoid merge conflicts for students.
- 2017.09.04 - first class just happened. Anytask submission form TBA
The syllabus is approximate: the lectures may occur in a slightly different order and some topics may end up taking two weeks.
-
week0 Welcome to Reinforcement Learning
- Lecture: RL problems around us. Decision processes. Basic genetic algorithms
- Seminar: Welcome into openai gym, basic genetic algorithms
- Homework description - see week0/README.md
-
week1 RL as blackbox optimization
- Lecture: Recap on genetic algorithms; Evolutionary strategies. Stochastic optimization, Crossentropy method. Parameter space search vs action space search.
- Seminar: Tabular CEM for Taxi-v0, deep CEM for box2d environments.
- Homework description - see week1/README.md
-
week2 Value-based methods
- Lecture: Discounted reward MDP. Value-based approach. Value iteration. Policy iteration. Discounted reward fails.
- Seminar: Value iteration.
-
week3 Model-free reinforcement learning
- Lecture: Q-learning. SARSA. Off-policy Vs on-policy algorithms. N-step algorithms. TD(Lambda).
- Seminar: Qlearning Vs SARSA Vs Expected Value SARSA
- HSE Homework deadline: _23.59 13.10.17
-
week4_recap - deep learning recap
- Lecture: Deep learning 101
- Seminar: Simple image classification with convnets
- HSE Homework deadline: _23.59 13.10.17
-
week4 Approximate reinforcement learning
- Lecture: Infinite/continuous state space. Value function approximation. Convergence conditions. Multiple agents trick; experience replay, target networks, double/dueling/bootstrap DQN, etc.
- Seminar: Approximate Q-learning with experience replay. (CartPole, Atari)
- HSE Homework deadline: _23.59 20.10.17
-
week5 Exploration in reinforcement learning
- Lecture: Contextual bandits. Thompson Sampling, UCB, bayesian UCB. Exploration in model-based RL, MCTS. "Deep" heuristics for exploration.
- Seminar: bayesian exploration for contextual bandits. UCB for MCTS.
-
week6 Policy gradient methods I
- Lecture: Motivation for policy-based, policy gradient, logderivative trick, REINFORCE/crossentropy method, variance reduction(baseline), advantage actor-critic (incl. GAE)
- Seminar: REINFORCE, advantage actor-critic
-
week7_recap Recurrent neural networks recap
- Lecture: Problems with sequential data. Recurrent neural netowks. Backprop through time. Vanishing & exploding gradients. LSTM, GRU. Gradient clipping
- Seminar: character-level RNN language model
-
week7 Partially observable MDPs
- Lecture: POMDP intro. POMDP learning (agents with memory). POMDP planning (POMCP, etc)
- Seminar: Deep kung-fu & doom with recurrent A3C and DRQN
-
week8 Applications II
- Lecture: Reinforcement Learning as a general way to optimize non-differentiable loss. G2P, machine translation, conversation models, image captioning, discrete GANs. Self-critical sequence training.
- Seminar: Simple neural machine translation with self-critical sequence training
-
week9 Policy gradient methods II
- Lecture: Trust region policy optimization. NPO/PPO. Deterministic policy gradient. DDPG. Bonus: DPG for discrete action spaces.
- Seminar: Approximate TRPO for simple robotic tasks.
Course materials and teaching by
- Fedor Ratnikov - lectures, seminars, hw checkups
- Oleg Vasilev - seminars, hw checkups, technical support
- Pavel Shvechikov - lectures, seminars, hw checkups, reading group
- Alexander Fritsler - lectures, seminars, hw checkups
- Using pictures from Berkeley AI course
- Massively refering to CS294
- Sevaral tensorflow assignments by Scitator
- A lot of fixes from arogozhnikov
- Other awesome people: see github contributors
- Better support for tensorflow & pytorch
- Our notation is now compatible with Sutton's
- Reworked & reballanced some assignments
- Added more practice on model-based RL