myleosu/DRL-Algorithms-with-Pytorch-for-Beginners

Deep reinforcement learning algorithms implemented by Pytorch, include PPO, SAC, TD3.

PythonMIT

Status: Active (under active development, breaking changes may occur)

origin project: Deep-reinforcement-learning-with-pytorch

Since the origin project is lack of maintenance by the author for years, this project is a long term active version with bug-fixing.

Requirements

python3
tensorboardX
gym == 0.21.0
tensorflow==1.15.2
pytorch == 1.4.0
torchvision

Installation

Recommend use Anaconda Virtual Environment to manage your packages

DQN

DuelingDQN: https://www.webofscience.com/wos/alldb/full-record/WOS:000684193702009
NoisyDQN: https://arxiv.org/abs/1706.10295
NoisyNet: https://arxiv.org/abs/1706.01905

DDPG

origin paper: https://arxiv.org/abs/1509.02971

PPO

origin paper: https://arxiv.org/abs/1707.06347

SAC

Soft Q-learning: https://arxiv.org/abs/1702.08165
SAC: http://arxiv.org/abs/1801.01290
SAC automated temperature: http://arxiv.org/abs/1812.05905
SAC to walk: https://arxiv.org/abs/1812.11103v3
SAC discrete: http://arxiv.org/abs/1910.07207

TD3

origin paper: https://arxiv.org/abs/1802.09477

TODO

SAC discrete
NoisyDQN
PPO2
ACER

Papers Related to the Deep Reinforcement Learning

[01] A Brief Survey of Deep Reinforcement Learning
[02] The Beta Policy for Continuous Control Reinforcement Learning
[03] Playing Atari with Deep Reinforcement Learning
[04] Deep Reinforcement Learning with Double Q-learning
[05] Dueling Network Architectures for Deep Reinforcement Learning
[06] Continuous control with deep reinforcement learning
[07] Continuous Deep Q-Learning with Model-based Acceleration
[08] Asynchronous Methods for Deep Reinforcement Learning
[09] Trust Region Policy Optimization
[10] Proximal Policy Optimization Algorithms
[11] Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
[12] High-Dimensional Continuous Control Using Generalized Advantage Estimation
[13] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
[14] Addressing Function Approximation Error in Actor-Critic Methods

Best RL courses

Best RL courses