- Spinning Up, OpenAI
- Rllab, Berkeley RL Lab
- Rllib, UC Berkeley’s RISE Lab, Ray Project
- Rlkit, Vitchyr Pong's Implementations
- Stable-baselines
- Baselines, OpenAI
- TensorForce
- Keras RL
- Dopamine, Google
- Coach, Intel
- PARL, Baidu
- torchrl, salmanazarr's implementations
- rlpack, liber145's implementations
- ChainerRL, based on Chainer
- SLM-Lab, Wah Loon Keng's implementations
- adeptRL, heronsystems
- autonomous-learning-library, Chris Nota's implementations
- rlgraph, RLgraph
- rllite, lite repository for RL
- Reinforcement Learning: An Introduction
- CS 294-112, UC Berkeley
- David Silver's courses
- Deep RL Bootcamp, UC Berkeley
- MIT 6.S094: Deep Learning for Self-Driving Cars 2018 Lecture 3 Notes: Deep Reinforcement Learning
- UCL Advanced Deep Learning & Reinforcement Learning
- CS234: Reinforcement Learning Winter 2019, Stanford
- awesome-deep-rl
- An Introduction to Deep Reinforcement Learning
- Morvan Reinforcement Learning
- PyTorch tutorials of RL
- Arthur Juliani's blog
- Thomas Simonini's DRL course
- Lilian Weng's blog
- Andrej Karpathy's blog
- Hung-yi Lee's DRL lectures (Chinese version)
- Spinning Up in Deep RL Workshop, OpenAI
- Reproducible, Reusable, and Robust RL, Joelle Pineau
- Udacity, RL by Georgia Tech
- Deep Learning Summer School, Montreal 2016
- CS 287: Advanced Robotics, Fall 2015
- RLKorea (Korean version)
- phd-bibliography
- Gym, OpenAI
- Universe, OpenAI
- Roboschool, OpenAI
- PlayGround: AI Research into Multi-Agent Learning
- Neural MMO: A Massively Multiagent Game Environment
- ELF, a platform for game research, Facebook
- Reaver, StarCraft II
- Pybullet: Real-Time Physics Simulation
- highway driving environments
- ma-gym
- multiworld, Vitchyr Pong
- RL-Adventure, Dulat Yerzat's implementation
- RL-Adventure2, Dulat Yerzat's implementation
- John Schulman's RL repository
- TianhongDai's implementations
- Denny Britz's implementations
- steveKapturowski's implementations
- DeepRL, ShangtongZhang's implementations
- reinforcement-learning-an-introduction, ShangtongZhang's implementations
- Reinforcement-learning-with-tensorflow, Morvan Zhou's implementations
- mario_rl, Chanwoong joo's implementations
- PyTorch-RL, Ye Yuan's implementations
- pytorchrl, Ermo Wei's implementations
- Entropy-Regularized-RL, lihaoruo's implementations
- Tidy-Reinforcement-learning, sarcturus00's implementations
- Random Network Distillation, jcwleo's implementation
- rl-baselines-zoo, pre-trained RL agents using stable-baselines
- rl-agents, Edouard Leurent's implementations
- pytorch-rl, Navneet M Kumar's implementations
- lets-do-irl, Reinforcement Learning KR's implementations
- tensorflow_practice/rl, princewen's implementations
- self-imitation-learning, Junhyuk Oh's implementations
- pytorch-a2c-ppo-acktr-gail, Ilya Kostrikov's implementations
- Deep_RL_with_pytorch, sungyubkim's implementations
- pytorch-madrl, Chenglong Chen's implementations
- Deep-reinforcement-learning-with-pytorch, Johnny He's implementations
- Sam Greydanus's A3C implementation
- TD3 implementation
- SAC implementation
- simple-A2C, Rudy Gilman's implementations
- DeepPILCO, zuoxingdong's implementation
- FeUdal-montezuma, Woongwon Lee's implementations
- Tidy-Reinforcement-learning, sarcturus00's implementations
- learning-to-communicate-pytorch, Minqi's implementations
- pytorch-cpp-rl, Isaac Poulton's implementations
- deep-rl, Pedro Morais's implementations
- Inverse-Reinforcement-Learning, Matthew Alger's implementations
- ReinforcementLearning-AtariGame, Nasrudin Bin Salim's implementations
- RND-Pytorch, WizDom13's implementations
- Deep-Reinforcement-Learning-Algorithms-with-PyTorch, Petros Christodoulou's implementations
- reinforcement_learning, Yiren Lu's implementations
- rl_a3c_pytorch, David Griffis's implementations
- RL-Experiments, Yanhua Huang's implementations
- Hierarchical-Actor-Critc-HAC-, Andrew Levy's implementation
- tensorflow_RL, RLOpensource
- noreward-rl, Deepak Pathak's implementations
- robotics-rl-srl, Antonin RAFFIN's implementations
- Super-Mario-Bros-RL, Amine SADEQ's implementations
- rltf, Nikolay Nikolov's implementations
- reward-learning-rl, Avi Singh's implementation
- Deep-Reinforcement-Learning-Algorithms-with-PyTorch, Petros Christodoulou's implementations
- rl-starter-files, Lucas Willems's implementation
- rl_a3c_pytorch, David Griffis's implementation
- DRL, createamind's implementations
- lagom, Xingdong Zuo's implementations
- rltime, Opher Lieber's implementations
- rl_algorithms, Medipixel
- rlpyt, astooke's implementations
- Model-based-papers
- DRL papers 2015-2016, Junhyuk Oh
- DRL papers 2015-2016, Yasuhiro Fujita
- MARL-Papers
- Spinning up, OpenAI
- Play Atari games, DeepMind
- AlphaGo & AlphaGoZero, playing Go, DeepMind
- AlphaStar, playing StarCraft2, DeepMind
- Autonomous helicopter flight, Stanford University
- Skill learning in 3D simulator, Xue Bin Peng
- Agile locomotion for quadruped robots, Jie Tan
- ANYmal robot skill learning, Synced
- Modular legged robot skill learning, Disney Research
- RL in business (Chinese version), Alibaba
- LOXM, executing trades, J.P.Morgan
Technique | Benefit | Mentioned Key Algorithm |
---|---|---|
Target network | Stabilize the training process | DQN, 2015 |
Memory buffer | Breaking data relevance | DQN, 2015 |
KL-constrained update | Optimize update step size | TRPO, 2015 |
Advantage function | Stabilize learning | A3C, 2015 |
Importance sampling | Data efficient | PER,2016 |
Entropy-regularized | Better exploration | Soft Q-Learning, 2018 |
Boltzmann policy | Richer mathematical meaning | Soft Q-Learning, 2018 |
Target policy smoothing | Avert Q-function incorrect sharp peak | TD3, 2018 |
Clipped double-Q learning | Fend off overestimation in the Q-function | TD3, 2018 |
Reparameterize the policy | Lower variance estimate | SAC, 2018 |
PS: "Mentioned Key Algorithm" may not be the first algorithm that uses this technique, but makes a detailed explanation