/awesome-reinforcement-learning-zh

中文整理的强化学习资料(Reinforcement Learning)

强化学习从入门到放弃的资料

2018-11-10: 1. 加入OpenAI的spinningup 2. 加入**大学李宏毅的课 3. 加入 UCL 汪军老师 与 SJTU 张伟楠 老师 在 SJTU 做的 Multi-Agent Reinforcement Learning Tutorial
4. update UCB 与 CMU的DRL课到2018 fall 5. update Sutton 的书到 final version

  • [Reinforcement Learning: An Introduction](#Reinforcement Learning: An Introduction )

  • [Algorithms for Reinforcement Learning](#Algorithms for Reinforcement Learning)

  • OpenAI-spinningup

  • 课程

  • 基础课程

    • [Rich Sutton 强化学习课程(Alberta)](#Rich Sutton 强化学习课程(Alberta))
    • [David Silver 强化学习课程(UCL)](#David Silver 强化学习课程(UCL))
    • [Stanford 强化学习课程](#Stanford 强化学习课程)
    • [UCL + STJU Multi-Agent Reinforcement Learning Tutorial](#Multi-Agent Reinforcement Learning Tutorial)
  • 深度DRL课程

    • [**大学 李宏毅 (深度)强化学习](#**大学 李宏毅 (深度)强化学习)
    • [UCB 深度强化学习课程](#UCB 深度强化学习课程)
    • [CMU 深度强化学习课程](#CMU 深度强化学习课程)

Reinforcement Learning: An Introduction

Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction update 第二版的最终版(点击obline draft)�: link,因为官方的是放在google doc上,所以我就下载了一个放在github上,需要自取 link

注:已经可以准备买实体书了,和同学各自海淘了一本,还没有到手 -- 国外亚马逊, 国内的话,可以考虑JD和国内的亚马逊--不过会贵一些

Algorithms for Reinforcement Learning

Csaba Szepesvari, Algorithms for Reinforcement Learning link

OpenAI-spinningup

这个算是比较杂的书吧,有在线doc+对应的code+对应的练习(非常建议结合UCL的一起看,我大致过了一遍,蛮不错的。 * 但是没有提到下面的UCL,UCB的课,也没有提到上面sutton的书,结合得看或许会更好 * 在线的文档 link 关于强化学习的基础介绍 link 关于深度强化学习的建议 link 代码部分 link

课程

基础课程

Rich Sutton 强化学习课程(Alberta)

课程主页 link

这个比较老了,有一个比较新的在google云盘上,我找个时间整理一下。

David Silver 强化学习课程(UCL)

注:这是David Silver大神2015在UCL开的课,现在感觉已经在DeepMind走向巅峰了,估计得等他那天想回学校培养学生才可能开出新的课吧。非常推荐入门学习,建立基础的RL概念。 课程主页:link

对应slide(课件): Lecture 1: Introduction to Reinforcement Learning link

Lecture 2: Markov Decision Processes link

Lecture 3: Planning by Dynamic Programming link

Lecture 4: Model-Free Prediction link

Lecture 5: Model-Free Control link

Lecture 6: Value Function Approximation link

Lecture 7: Policy Gradient Methods link

Lecture 8: Integrating Learning and Planning link

Lecture 9: Exploration and Exploitation link

Lecture 10: Case Study: RL in Classic Games link

Stanford 强化学习课程

注:为2018 spring的课 课程主页: link

对应slide(课件): Introduction to Reinforcement Learning link

How to act given know how the world works. Tabular setting. Markov processes. Policy search. Policy iteration. Value iteration link

Learning to evaluate a policy when don't know how the world works. link

Model-free learning to make good decisions. Q-learning. SARSA. link

Scaling up: value function approximation. Deep Q Learning. link

Deep reinforcement learning continued. link

Imitation Learning. link

Policy search. link

Policy search. link

Midterm review. link

Fast reinforcement learning (Exploration/Exploitation) Part I. link

Fast reinforcement learning (Exploration/Exploitation) Part II. link

Batch Reinforcement Learning. link

Monte Carlo Tree Search. link

Human in the loop RL with a focus on transfer learing. link

Multi-Agent Reinforcement Learning Tutorial

注:因为在阿里广告这边实习,有幸和汪老师还有张老师做了篇论文。在过程中体会到汪老师的思维真的很活跃,很强。另外,张老师感觉是国内cs冉冉升起的新星,值得follow和关注!

课程主页 link

Fundamentals of Reinforcement Learning link Fundamentals of Game Theory link Learning in Repeated Games link Multi-Agent Reinforcement Learning link link

深度DRL课程

**大学 李宏毅 (深度)强化学习

课程主页 [link](http://speech. ee.ntu.edu.tw/~tlkagk/courses/)

视频可以在B站上看到:link

UCB 深度强化学习课程

课程主页: link

update:2018 fall(2018年秋季)

对应slide(课件):

Lecture Slides See Syllabus for more information.

Introduction and Course Overview link Supervised Learning and Imitation link TensorFlow and Neural Nets Review Session (notebook) link Reinforcement Learning Introduction link Policy Gradients Introduction link Actor-Critic Introduction link Value Functions and Q-Learning link Advanced Q-Learning Algorithms link Advanced Policy Gradients link Optimal Control and Planning link Model-Based Reinforcement Learning link Advanced Model Learning and Images link Learning Policies by Imitating Other Policies link Probability and Variational Inference Primer link Connection between Inference and Control link Inverse Reinforcement Learning link Explorationlinklink Transfer Learning and Multi-Task Learning link Meta-Learning link Parallelism and RL System Design link Advanced Imitation Learning and Open Problems link

CMU 深度强化学习课程

update fall 2018

2018 fall 的课程主页 link 2017的课程主页: link

对应slide(课件): Introduction link

Markov decision processes (MDPs), POMDPs link

Solving known MDPs: Dynamic Programming link

Policy iteration, Value iteration, Asynchronous DP link

Monte Carlo Learning, Temporal difference learning, Q learning link

Temporal difference learning (Tom), Planning and learning: Dyna, Monte carlo tree search link

Deep NN Architectures for RL link

Recitation on Monte Carlo Tree Search link

VF approximation, MC, TD with VF approximation, Control with VF approximationlink

Deep Q Learning : Double Q learning, replay memorylink Policy Gradients link link

Advanced Policy Gradients link

Evolution Methods, Natural Gradients link

Natural Policy Gradients, TRPO, PPO, ACKTR link

Pathwise Derivatives, DDPG, multigoal RL, HER link��

Exploration vs. Exploitation link link

Exploration and RL in Animals link link

Model-based Reinforcement Learning link

Imitation Learning link

Maximum Entropy Inverse RL, Adversarial imitation learning link

Recitation: Trajectory optimization - iterative LQR link

Learning to learn, one shot learning[link](Learning to learn, one shot learning)