- English
- Reinforcement Learning: An Introduction [Book] [Code] [Preferred] [old version] [newest version]
- Algorithm of Reinforcement Learning [Official]
- OpenAI Spinning Up
- Reinforcement Learning for Sequential Decision and Optimal Control
- Dynamic programming and optimal control
- Deep-Reinforcement-Learning-Hands-On [pdf 2 edition]
- Reinforcement Learning and Optimal Control
- Reinforcement Learning: Theory and Algorithms
- Markov Decision Processes: Discrete Stochastic Dynamic Programming, by Martin Puterman.
- Neuro-Dynamic Programming, by Dimitri Bertsekas and John Tsitsiklis.
- Chinese
- 动手学强化学习 张伟楠 [Preferred]
- 深度强化学习落地指南 魏宁
- 深度强化学习 王树森 [PDF] [Preferred]
- EasyRL 强化学习教程
- 深度强化学习 董豪 [pdf]
- 深入浅出强化学习 郭宪 [Preferred]
- 神经网络与深度学习 邱锡鹏
- 机器学习 周志华
- UCL. Reinforcement Learning. David Silver. Difficulty: [★]
- UCL. Advanced Topics. David Silver.
- Tencent. Reinforcement Learning. MoFan. Difficulty: [★]
- National Taiwan University. DRL. Hung-Yi LEE. [Preferred]. Difficulty: [★]
- Deep Reinforcement Learning. Shusen Wang. [Bilibili]
- UCLA. Intro to Reinforcement Learning. Bolei Zhou. Difficulty: [★]
- UC Berkeley CS294 (before), CS285 Sergey Levine
- Stanford CS234 RL Emma Brunskill [Bilibili] [Official]
- MIT RL Dimitri Bertsekas
- RL and control THU
- CMU Deep Reinforcement Learning Katerina Fragkiadaki [Link]
- Udacity
- Lex Fridman
- ETHz Dynamic Programming and Optimal Control Raffaello D'Andrea
- Pieter Abbeel
- 高级机器学习 唐杰
- 李升波
- UIUC, CS 542, CS 443, Nan Jiang.
- R. Srikant. UIUC ECE 586.
- Ron Parr. Duke CompSci 590.2.
- Ben Van Roy. Stanford MS&E 338.
- Ambuj Tewari and Susan Murphy. U Michigan STATS 710.
- Susan Murphy. Harvard Stat 234.
- Alekh Agarwal and Alex Slivkins. Columbia COMS E6998.001.
- Daniel Russo. Columbia B9140-001.
- Shipra Agrawal. Columbia IEOR 8100.
- Emma Brunskill CMU 15-889e.
- Philip Thomas. U Mass CMPSCI 687.
- Michael Littman. Brown CSCI2951-F.
- NJU. IntroRL. Yang Yu.
- CMU 16 745
- CSE 691 asu
-
Approximate Dynamic Programming (ADP) concerns obtaining approximate solutions to large planning problems, often with the help of sampling and function approximation. Many ADP methods can be considered as prototype algorithms for popular value-based RL algorithms used today, especially in the offline setting, so it is important to understand their behaviors and guarantees.
-
- Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
- Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data
- Hybrid rl: Using both offline and online data can make rl efficient
-
-
How to estimate the performance of a policy using data collected from a different policy? This question has important implications in safety and real-world applications of RL.
- rlcode
- Deep-Reinforcement-Learning-Algorithms-with-PyTorch
- Deep-reinforcement-learning-with-pytorch
- reinforcement-learning [most stars]
- OpenAI stable baseline3
- Google Dopamine
- Intel Coach
- Clean RL
- Tencent AIArena
- AWS DeepRacer
Conference: NIPS, ICML, ICLR, AAAI, IJCAI, AAMAS, IROS, etc.
Journal: JMLR, JAIR, JAAMAS, etc.
- Asia
- CASIA
- Haifeng Zhang [Homepage] [Group]
- Zhiqiang Pu [Homepage]
- Dongbin Zhao [Homepage]
- Junliang Xing [Homepage]
- NJU
- Yang Yu - Interested in [Homepage]
- Yinghuan Shi [Homepage]
- Yang Gao [Homepage]
- Zongzhang Zhang [Homepage]
- NJU SME Faculty
- SJTU
- Yong Yu [Homepage]
- Weinan Zhang [Homepage]
- Kai Yu [Homepage]
- Ying Wen [Homepage]
- PKU
- Yaodong Yang [Homepage]
- Zhihua Zhang [Homepage]
- Zongqing Lu [Homepage]
- Hao Dong [Homepage]
- THU
- Chongjie Zhang [Homepage]
- Yi Wu [Homepage] [Group]
- Zhihua Zhang [Homepage]
- Shengbo Li [Homepage] [Group]
- USTC
- Feng Wu [Homepage]
- Houqiang Li [Homepage]
- CUHK-SZ
- Baoxiang Wang [Homepage]
- Hongyuan Zha [Homepage]
- CUHK
- Baoxiang Wang
- TJU
- Jianye Hao [Homepage] [Group]
- SIAT
- Yunduan Cui [Homepage]
- HIT-SZ
- Yanjie Li [Homepage]
- NTU
- Bo An [Homepage]
- NUDT
- Xin Xv
- SYSU
- Chao Yu [Homepage]
- CASIA
- North America
- Mcgill
- Doina Precup
- Joelle Pineau
- Alberta
- Michael Bowling [Homepage]
- UCLA
- Bolei Zhou [Homepage]
- MIT
- Pulkit Agrawal
- Leslie Kaelbling
- Russ Tedrake
- Nicholas Roy
- CMU
- Geoffrey Gordon
- Emma Brunskill
- Jeff Schneider
- Andrew Moore
- Jessica K. Hodgins
- Wen Sun [Homepage]
- Berkeley
- Sergey Levine
- Michael Jordan
- Pieter Abbeel [Gruop]
- Dimitri Bertsekas
- Emma Brunskill
- Chelsea Finn
- Anca Dragan
- Ken Goldberg
- Stuart Russell
- Standford
- Benjamin Van Roy
- Emma Brunskill
- Mykel Kochenderfer
- Dorsa Sadigh
- Tengyu Ma
- Chelsea Finn
- Andrew Ng
- UIUC
- Nan Jiang Homepage
- Duke
- Ronald Parr [Homepage]
- Brown
- Michael Littman
- Columbia
- Daniel Russo
- Shipra Agrawal
- Alekh Agarwal
- Alex Slivkins
- Toronto
- Jimmy Ba [Homepage]
- Sheila McIlraith [Homepage]
- Mcgill
- Europe
- INRIA
- Flower Team [Homepage]
- ETH Zurich
- Andreas Krause [Homepage]
- Oxford
- Jakob Foerster
- Cambridge
- IC
- UCL
- INRIA
- Useful inequalities cheat sheet
- Concentration of measure
- dalmia/David-Silver-Reinforcement-learning: Notes for the Reinforcement Learning course by David Silver along with implementation of various algorithms. (github.com)
- 强化学习路线推荐及资料整理 - 知乎 (zhihu.com)
- PacktPublishing/Mastering-Reinforcement-Learning-with-Python: Mastering Reinforcement Learning with Python, published by Packt (github.com)
- Policy-based vs. Value-based [ZhiHu]
- Philosophy of Reinforcement Learning
This is an active repository and your contributions are always welcome!
If you find it helpful, please vote for it by adding 👍.
If you have any question about this list, do not hesitate to contact me at jiangjiwen328@gmail.com.