强化学习的相关学习资源、链接
- Hands-On Reinforcement Learning With Python
- Reinforcement Learning: Theory and Python Implementation
- An Introduction to Deep Reinforcement Learning
- Foundations and Trends® in Machine Learning
- REINFORCEMENT LEARNING AND OPTIMAL CONTROL
- CS 294: Deep Reinforcement Learning;
- David Silver's course;
- John Schulmann's lectures
- Deep RL Bootcamp
- CS 287: Advanced Robotics, Fall 2015
- CS234: Reinforcement Learning Winter 2019
- Deep Learning (DLSS) and Reinforcement Learning (RLSS) Summer School, Montreal 2017
- Advanced Deep Learning and Reinforcement Learning
- 强化学习教程(莫烦)
- Play pong with deep reinforcement learning based on pixel
- Deep Learning in a Nutshell: Reinforcement Learning
- AlphaGo
- 加州大学伯克利分校机器人学专家 Sergey Levine
- 前百度首席科学家 Andrew Ng
- 加拿大阿尔伯塔大学著名增强学习大师Richard S. Sutton 教授
- Google DeepMind AlphaGo项目的主程序员 David Silver 博士
- 机器博弈专家Tuomas Sandholm教授
- Reinforcement learning resources curated
- Awesome Reinforcement Learning(RL) for Natural Language Processing(NLP))
- Paper list of multi-agent reinforcement learning (MARL) )
- A list of recent papers regarding deep reinforcement learning
- TensorFlow implementation of Deep Reinforcement Learning papers
- Deep Reinforcement Learning Papers
- Reinforcement learning resources curated
- This project is for learning and researching on Deep RL. Maintained by University AI researchers
- 强化学习从入门到放弃的资料
- Reinforcement Learning Notebooks
- Deep Reinforcement Learning(深度强化学习)
- rllab
- Baseline
- Stable Baselines
- keras-rl
- BURLAP
- PyBrain
- RLPy
- A Matlab Toolbox for Approximate RL and DP
- Pytorch Implementation of DQN / DDQN / Prioritized replay/ noisy networks/ distributional values/ Rainbow/ hierarchical RL
- PyTorch implementations of various DRL algorithms for both single agent and multi-agent
- Deep Reinforcement Learning for Keras
- PyTorch 实现 DQN, AC, A2C, A3C, , Policy Gradient, DDPG, TRPO, PPO, ACER
- Deep Reinforcement learning framework
- Codes for understanding Reinforcement Learning( updating... )
- Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
- Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course
- Repo for the Deep Reinforcement Learning Nanodegree program
- 教程 | 如何在Unity环境中用强化学习训练Donkey Car
- 深入浅出解读"多巴胺(Dopamine)论文"、环境配置和实例分析
- DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills
- StarCraft II - pysc2 Deep Reinforcement Learning Examples
- An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
- Using reinforcement learning to teach a car to avoid obstacles
- A reinforcement learning algorithm for the 2048 game
- DQN-arxiv (Deep Q-Networks ): Mnih et al, 2013
- DQN-nature(Deep Q-Network ); Mnih et al, 2015
- Double DQN (Double Q Network) : Hasselt et al, 2015
- Dueling DQN (Duling Q Network) : Ziyu Wang et al, 2015
- QR-DQN (Quantile Regression DQN): Dabney et al, 2017
- Alpha Go(Mastering the game of Go with deep neural networks and tree search)
- AlphaZero-arxiv (Mastering Chess and Shogi by Self-Play) :Silver et al, 2017
- AlphaZero-nature (Go without human knowledge) :Silver et al, 2017
- SAC (Off-Policy Maximum Entropy): Haarnoja et al, 2018
- SAC (Algorithms and Applications) : Haarnoja, et al 2018
- A2C / A3C (Asynchronous Advantage Actor-Critic): Mnih et al, 2016
- PPO (Proximal Policy Optimization): Schulman et al, 2017
- TRPO (Trust Region Policy Optimization): Schulman et al, 2015
- DPG (Deterministic Policy Gradient) : DavidSilver et al, 2014
- DDPG (Deep Deterministic Policy Gradient): Lillicrap et al, 2015
- TD3 (Twin Delayed DDPG): Fujimoto et al, 2018
- NAF (Normalized adantage functions) : ShixiangGu et al, 2016
- C51 (Categorical 51-Atom DQN): Bellemare et al, 2017
- HER (Hindsight Experience Replay): Andrychowicz et al, 2017
- World Models Ha and Schmidhuber, 2018
- I2A (Imagination-Augmented Agents): Weber et al, 2017
- MBMF (Model-Based RL with Model-Free Fine-Tuning): Nagabandi et al, 2017
- MBVE (Model-Based Value Expansion): Feinberg et al, 2018
- PathNet(Evolution Channels Gradient Descent): Fernando et al, 2017
- plannet(Learning Latent Dynamics) : Hafner, et al, 2018
- TCN (Time-Contrastive Networks):Sermanet, et al, 2017
- Reinforcement and Imitation Learning : Yuke Zhu†, et al 2018
- Prioritized experience replay:Schaul, et al 2015
- Policy distillation : Rusu, et al 2015
- Unifying Count-Based Exploration and Intrinsic Motivation : Bellemare, et al 2015
- Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models : Stadie, et al 2015
- Action-Conditional Video Prediction using Deep Networks in Atari Games : JunhyukOh, et al 2015
- Control of Memory, Active Perception, and Action in Minecraft : JunhyukOh, et al 2015