-
[Reinforcement Learning: An Introduction](#Reinforcement Learning: An Introduction )
-
[Reinforcement Learning: An Introduction ](#Reinforcement Learning: An Introduction )
-
- [Rich Sutton 强化学习课程(Alberta)](#Rich Sutton 强化学习课程(Alberta))
- [David Silver 强化学习课程(UCL)](#David Silver 强化学习课程(UCL))
- [Stanford 强化学习课程](#Stanford 强化学习课程)
-
- [UCB 深度强化学习课程](#UCB 深度强化学习课程)
- [CMU 深度强化学习课程](#CMU 深度强化学习课程)
Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction link
- Csaba Szepesvari, Algorithms for Reinforcement Learning link
课程主页 link
这个比较老了,有一个比较新的在google云盘上,我找个时间整理一下。
课程主页:link
对应slide(课件): Lecture 1: Introduction to Reinforcement Learning link
Lecture 2: Markov Decision Processes link
Lecture 3: Planning by Dynamic Programming link
Lecture 4: Model-Free Prediction link
Lecture 5: Model-Free Control link
Lecture 6: Value Function Approximation link
Lecture 7: Policy Gradient Methods link
Lecture 8: Integrating Learning and Planning link
Lecture 9: Exploration and Exploitation link
Lecture 10: Case Study: RL in Classic Games link
课程主页: link
对应slide(课件): Introduction to Reinforcement Learning link
How to act given know how the world works. Tabular setting. Markov processes. Policy search. Policy iteration. Value iteration link
Learning to evaluate a policy when don't know how the world works. link
Model-free learning to make good decisions. Q-learning. SARSA. link
Scaling up: value function approximation. Deep Q Learning. link
Deep reinforcement learning continued. link
Imitation Learning. link
Policy search. link
Policy search. link
Midterm review. link
Fast reinforcement learning (Exploration/Exploitation) Part I. link
Fast reinforcement learning (Exploration/Exploitation) Part II. link
Batch Reinforcement Learning. link
Monte Carlo Tree Search. link
Human in the loop RL with a focus on transfer learing. link
课程主页: link
对应slide(课件): Introduction and course overviewlink
Supervised learning and imitation link
Reinforcement learning introduction link
Policy gradients introduction link
Actor-critic introduction link
Value functions introduction link
Advanced Q-learning algorithms link
Optimal control and planning link
Learning dynamical systems from data link
Learning policies by imitating optimal controllers link
Advanced model learning and images link
Connection between inference and control link
Inverse reinforcement learning link
Advanced policy gradients (natural gradient, importance sampling) link
Exploration link
Exploration (part 2) and transfer learning link
Multi-task learning and transfer link
Meta-learning and parallelism link
Advanced imitation learning and open problems link
课程主页: link
对应slide(课件): Introduction link
Markov decision processes (MDPs), POMDPs link
Solving known MDPs: Dynamic Programming link
Monte Carlo learning: value function (VF) estimation and optimization link
Temporal difference learning: VF estimation and optimization, Q learning, SARSA link
Planning and learning: Dyna, Monte carlo tree search link
VF approximation, MC, TD with VF approximation, Control with VF approximation link
Deep Q Learning : Double Q learning, replay memory link
Policy Gradients I, Policy Gradients II link link
Continuous Actions, Variational Autoencoders, multimodal stochastic policies link
Imitation Learning I: Behavior Cloning, DAGGER, Learning to Search link
Imitation Learning II: Inverse RL, MaxEnt IRL, Adversarial link
Imitation learning III: imitating controllers, learning local models link
Optimal control, trajectory optimization link
End-to-end policy optimization through back-propagation link
Exploration and Exploitation Russ [link](Exploration and Exploitation)
Hierarchical RL and Tranfer Learning link
Recitation: Trajectory optimization - iterative LQR link
Transfer learning(2): Simulation to Real World link
Memory Augmented RL link
Learning to learn, one shot learning link