/awesome-reinforcement-learning-zh

中文整理的强化学习资料(Reinforcement Learning)

强化学习从入门到放弃的资料

  • [Reinforcement Learning: An Introduction](#Reinforcement Learning: An Introduction )

  • [Reinforcement Learning: An Introduction ](#Reinforcement Learning: An Introduction )

  • 课程

  • 基础课程

    • [Rich Sutton 强化学习课程(Alberta)](#Rich Sutton 强化学习课程(Alberta))
    • [David Silver 强化学习课程(UCL)](#David Silver 强化学习课程(UCL))
    • [Stanford 强化学习课程](#Stanford 强化学习课程)
  • 深度DRL课程

    • [UCB 深度强化学习课程](#UCB 深度强化学习课程)
    • [CMU 深度强化学习课程](#CMU 深度强化学习课程)

Reinforcement Learning: An Introduction

Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction link

Algorithms for Reinforcement Learning

  • Csaba Szepesvari, Algorithms for Reinforcement Learning link

课程

基础课程

Rich Sutton 强化学习课程(Alberta)

课程主页 link

这个比较老了,有一个比较新的在google云盘上,我找个时间整理一下。

David Silver 强化学习课程(UCL)

课程主页:link

对应slide(课件): Lecture 1: Introduction to Reinforcement Learning link

Lecture 2: Markov Decision Processes link

Lecture 3: Planning by Dynamic Programming link

Lecture 4: Model-Free Prediction link

Lecture 5: Model-Free Control link

Lecture 6: Value Function Approximation link

Lecture 7: Policy Gradient Methods link

Lecture 8: Integrating Learning and Planning link

Lecture 9: Exploration and Exploitation link

Lecture 10: Case Study: RL in Classic Games link

Stanford 强化学习课程

课程主页: link

对应slide(课件): Introduction to Reinforcement Learning link

How to act given know how the world works. Tabular setting. Markov processes. Policy search. Policy iteration. Value iteration link

Learning to evaluate a policy when don't know how the world works. link

Model-free learning to make good decisions. Q-learning. SARSA. link

Scaling up: value function approximation. Deep Q Learning. link

Deep reinforcement learning continued. link

Imitation Learning. link

Policy search. link

Policy search. link

Midterm review. link

Fast reinforcement learning (Exploration/Exploitation) Part I. link

Fast reinforcement learning (Exploration/Exploitation) Part II. link

Batch Reinforcement Learning. link

Monte Carlo Tree Search. link

Human in the loop RL with a focus on transfer learing. link

深度DRL课程

UCB 深度强化学习课程

课程主页: link

对应slide(课件): Introduction and course overviewlink

Supervised learning and imitation link

Reinforcement learning introduction link

Policy gradients introduction link

Actor-critic introduction link

Value functions introduction link

Advanced Q-learning algorithms link

Optimal control and planning link

Learning dynamical systems from data link

Learning policies by imitating optimal controllers link

Advanced model learning and images link

Connection between inference and control link

Inverse reinforcement learning link

Advanced policy gradients (natural gradient, importance sampling) link

Exploration link

Exploration (part 2) and transfer learning link

Multi-task learning and transfer link

Meta-learning and parallelism link

Advanced imitation learning and open problems link

CMU 深度强化学习课程

课程主页: link

对应slide(课件): Introduction link

Markov decision processes (MDPs), POMDPs link

Solving known MDPs: Dynamic Programming link

Monte Carlo learning: value function (VF) estimation and optimization link

Temporal difference learning: VF estimation and optimization, Q learning, SARSA link

Planning and learning: Dyna, Monte carlo tree search link

VF approximation, MC, TD with VF approximation, Control with VF approximation link

Deep Q Learning : Double Q learning, replay memory link

Policy Gradients I, Policy Gradients II link link

Continuous Actions, Variational Autoencoders, multimodal stochastic policies link

Imitation Learning I: Behavior Cloning, DAGGER, Learning to Search link

Imitation Learning II: Inverse RL, MaxEnt IRL, Adversarial link

Imitation learning III: imitating controllers, learning local models link

Optimal control, trajectory optimization link

End-to-end policy optimization through back-propagation link

Exploration and Exploitation Russ [link](Exploration and Exploitation)

Hierarchical RL and Tranfer Learning link

Recitation: Trajectory optimization - iterative LQR link

Transfer learning(2): Simulation to Real World link

Memory Augmented RL link

Learning to learn, one shot learning link