/DeepRL_Algorithms

DeepRL algorithms implementation easy for understanding and reading with Pytorch and Tensorflow 2(DQN, REINFORCE, VPG, A2C, TRPO, PPO, DDPG, TD3, SAC)

Primary LanguagePython

About Deep Reinforcement Learning

The combination of Reinforcement Learning and Deep Learning produces a series of important algorithms. This project will focus on referring to relevant papers and implementing relevant algorithms as far as possible.

This repo aims to implement Deep Reinforcement Learning algorithms using Pytorch and Tensorflow 2.

1.Why do this?

  • Implementing all of this algorithms from scratch really helps you with your parameter tuning;
  • The coding process allows you to better understand the principles of the algorithm.

2.Lists of Algorithms

2.1 Value based

Value based algorithms include DQNs.

[1]. DQN Pytorch / Tensorflow, Paper: Playing Atari with Deep Reinforcement Learning
[2]. Double DQN Pytorch / Tensorflow, Paper: Deep Reinforcement Learning with Double Q-learning
[3]. Dueling DQN Pytorch / Tensorflow, Paper: Dueling Network Architectures for Deep Reinforcement Learning

2.2 Policy based

Policy based algorithms is currently perform better, including Policy Gradient Methods.

[1]. REINFORCE Pytorch / Tensorflow, Paper: Policy Gradient Methods for Reinforcement Learning with Function Approximation
[2]. VPG(Vanilla Policy Gradient) Pytorch / Tensorflow, Paper: High Dimensional Continuous Control Using Generalized Advantage Estimation
[3]. A2C Pytorch, Paper: Asynchronous Methods for Deep Reinforcement Learning Synchronous version of A3C
[4]. DDPG Pytorch, Paper: Continuous Control With Deep Reinforcement Learning
[5]. TRPO Pytorch / Tensorflow, Paper: Trust Region Policy Optimization
[6]. PPO Pytorch / Tensorflow, Paper: Proximal Policy Optimization Algorithms
[7]. SAC Pytorch, Paper: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
[8]. SAC with Automatically Adjusted Temperature Pytorch, Paper: Soft Actor-Critic Algorithms and Applications
[9]. TD3(Twin Delayed DDPG) Pytorch, Paper: Addressing Function Approximation Error in Actor-Critic Methods

2.3 Imitation Learning

Imitation learning learn from expert data.

[1]. GAIL Pytorch, Paper: Generative Adversarial Imitation Learning

3.Project Dependencies

  • Python >=3.6
  • Tensorflow >= 2.4.0
  • Pytorch >= 1.5.0
  • Seaborn >= 0.10.0
  • Click >= 7.0

Full dependencies are listed in the requirements.txt file, install with pip:

pip install -r requirements.txt

You can install the project by typing the following command:

python install -e .

4.Run

Each algorithm is implemented in a single folder including 4 files:

1. main.py # A minimal executable example for algorithm  

2. [algorithm].py # Main body for algorithm implementation  

3. [algorithm]_step.py # Algorithm update core step 

4. test.py # Loading pretrained model and test performance of the algorithm

The default main.py is a an executable example, the parameters are parsed by click.

You can run algorithm from the main.py or bash scripts.

  • You can simply type python main.py --help in the algorithm package to view all configurable parameters.
  • The directory Scripts gives some bash scripts, you can modify them at will.

5.Visualization of performance

Utils/plot_util.py provide a simple plot tool based on Seaborn and Matplotlib. All the plots in this project are drawn by this plot util.

5.1 Benchmarks for DQNs

Pytorch Version

bench_dqn

Tensorflow2 Version

bench_dqn_tf2

5.2 Benchmarks for PolicyGradients

Pytorch Version

bench_pg

Tensorflow2 Version

Currently only VPG, PPO and TRPO Available:

bench_pg_tf2