Status: Active (under active development, breaking changes may occur)
origin project: Deep-reinforcement-learning-with-pytorch
Since the origin project is lack of maintenance by the author for years, this project is a long term active version with bug-fixing.
- python3
- tensorboardX
- gym == 0.21.0
- tensorflow==1.15.2
- pytorch == 1.4.0
- torchvision
Recommend use Anaconda Virtual Environment to manage your packages
- DuelingDQN: https://www.webofscience.com/wos/alldb/full-record/WOS:000684193702009
- NoisyDQN: https://arxiv.org/abs/1706.10295
- NoisyNet: https://arxiv.org/abs/1706.01905
- origin paper: https://arxiv.org/abs/1509.02971
- origin paper: https://arxiv.org/abs/1707.06347
- Soft Q-learning: https://arxiv.org/abs/1702.08165
- SAC: http://arxiv.org/abs/1801.01290
- SAC automated temperature: http://arxiv.org/abs/1812.05905
- SAC to walk: https://arxiv.org/abs/1812.11103v3
- SAC discrete: http://arxiv.org/abs/1910.07207
- origin paper: https://arxiv.org/abs/1802.09477
- SAC discrete
- NoisyDQN
- PPO2
- ACER
[01] A Brief Survey of Deep Reinforcement Learning
[02] The Beta Policy for Continuous Control Reinforcement Learning
[03] Playing Atari with Deep Reinforcement Learning
[04] Deep Reinforcement Learning with Double Q-learning
[05] Dueling Network Architectures for Deep Reinforcement Learning
[06] Continuous control with deep reinforcement learning
[07] Continuous Deep Q-Learning with Model-based Acceleration
[08] Asynchronous Methods for Deep Reinforcement Learning
[09] Trust Region Policy Optimization
[10] Proximal Policy Optimization Algorithms
[11] Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
[12] High-Dimensional Continuous Control Using Generalized Advantage Estimation
[13] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
[14] Addressing Function Approximation Error in Actor-Critic Methods