Notes for RL Video Lectures

用中文记录一些强化学习笔记，比如周博磊老师的视频课程、RLChina的视频课程、论文阅读笔记。适于有一定基础的小伙伴学习的资料。

我的笔记分布

🥊 入门学习 / 读书笔记 GitHub链接：PiperLiu/Reinforcement-Learning-practice-zh
💻 阅读论文 / 视频课程的笔记 GitHub链接：PiperLiu/introRL
✨ 大小算法 / 练手操场 GitHub链接：PiperLiu/Approachable-Reinforcement-Learning

Catalog

周博磊老师的视频课程
RLChina的视频课程
论文阅读笔记

Overview for Each Course

Bolei_Zhou

周博磊老师的视频课程，我将其定位为：

中规中矩、系统的深度强化学习入门课程；
与强化学习圣经书 Introduction 同，其也是从马尔可夫决策过程与 value-based(DQN) 入手，但弥补了圣经书中深度强化学习与近年来成果的缺失；
关于 SOTA 的两条发展线，见下表；
在 SOTA 的内容后，讲解了基于模型的强化学习、模仿学习和大规模强化学习（分布式系统），算是科普了；
没有关于 MARL 多智能体的内容。

此外，周老师所汇总的资源极其优质，课程中会提及经典论文对应的代码库、基础概念的 web-demo 等。因此，我打算二刷，并且将重点集中在实践上。

课程目录见 link ，我的笔记见 notes 。

RLChina

华人强化学习顶尖学者社区 RLChina 举办的公益活动，质量很高，且将注意力集中在了弥补 MARL 这部分资料缺失的短板，可以理解为各个先进领域的讲座：

除去开头的基础知识铺垫，包含：
- 将推理模型应用于控制 Control as Inference ；
- 模仿学习 Imitation Learning ；
- 稀疏奖励下的强化学习 Learning with Sparse Rewards ；
- 博弈论 Game Theory Basic ；
- 多智能体系统 Multi-agent Systems 及 MARL 的扩展（共4节）。
多智能体部分，将在最后一部分介绍崭新的思路：使用物理中的 Mean-Field 理论，去研究大规模智能体控制。

2020年8月份的课程，很新，当时没有看直播（直播不能自己调节进度）。会在2020年12月份前完成课程笔记。

课程目录见 link ，我的笔记见 notes 。

Zhou-Readme

Overview

This short RL course introduces the basic knowledge of reinforcement learning. Slides are made in English and lectures are given by Bolei Zhou in Mandarin. The course is for personal educational use only. Please open an issue if you spot some typos or errors in the slides.

Course Schedule

The course is scheduled as follows. There are 10 lectures in total, where the first one was premiered on 16 March 2020 and the last one was finished on 25 May 2020. Thanks for watching and may ReinForce be with you!

	Topic	Resources
Lecture1	Overview (课程概括与RL基础)	slide, Youtube(part1, part2), B站(上集, 下集)
Lecture2	Markov Decision Process (马尔科夫决策过程)	slide, Youtube(part1, part2), B站(上集, 下集)
Lecture3	Model-free Prediction and Control (无模型的预测和控制)	slide, Youtube(part1, part2), B站(上集, 下集)
Lecture4	Value Function Approximation (价值函数近似)	slide, Youtube(part1, part2), B站(上集, 下集)
Lecture5	Policy Optimization: Foundation (策略优化基础篇)	slide, Youtube(part1, part2), B站(上集, 下集)
Lecture6	Policy Optimization: State of the art (策略优化进阶篇)	slide, Youtube(part1, part2), B站(上集, 下集)
Lecture7	Model-based RL (基于环境模型的RL)	slide, Youtube, B站
Lecture8	Imitation Learning (模仿学习)	slide, Youtube, B站
Lecture9	Distributed systems for RL (分布式系统)	slide, Youtube, B站
Lecture10	RL in a nutshell (课程结局篇)	slide, Youtube, B站
Bonus 1	DeepMind's AlphaStar Explained (剖析星际争霸AI) by Zhenghao Peng	slide, Youtube, B站

PiperLiu/introRL