DRL Algos Collection

A colleciotn of implements of classical DRL algorithms.

The repository contain the implementation of A3C, A2C, DDQN, and REINFORCE(naive) with Tensorflow2.0. Some of them have been demostrated in the OpenAI Cart Pole environment.

In additon, it modulizes the API of environments(Cart Pole, Flappy Bird, and remote environment), exploration stratgies(although I still working on it). The remote environment even allows the agent to connect to the external server and interact with them.

DRL Models

A3C

Still working on modulizing, but here is the DEMO on OpenAI Cart Pole. I use Master-Slave strategy(which is similar to the parameter server strategy in TensorFlow1) with TensorFlow2.0 and Multiprocessing.py to implement. Worker send the updated gradients to the master, and the master receive the updated gradients and apply them to the global model. The master also keeps send the lastest model variables to the workers.

However, TensorFlow2.0 has removed the tf.Session() which can allocate the computation task to specific device. Therefore, I use with tf.device() to specify the device of the task.

For more detail, please read the doc: Tricks of A3C on TensorFlow2 + Multiprocessing

and here is the DEMO

Tensorflow DEMO on Cart Pole Tensorflow DEMO on Flappy Bird

A2C

Implementation of Actor-Critic Network

Tensorflow DEMO on Cart Pole

Tensorflow Code

DDQN

Implementation of Doubly Deep Q-Network with Tensorflow

Tensorflow Code

REINFORCE

Tensorflow Code

DDPG

Working on it

PPO

Working on it

Environments

Integrate the API of different environments.

Cart Pole

From OpenAI Cart Pole.

Python Code

Flappy Bird

From PLE Flappy Bird

Python Code

Remote Environment

Create a TCP client and connect to the provided server. You can see the DEMO and the details in another repo(RL Java Integral) which we implement a Java multi-threading server and interact with the A3C model. Thanks tom1236868 for implementing the Java server.

Python Code

With LSTM

Reference:

FrankCCCCC/rl_collection

DRL Algos Collection

A colleciotn of implements of classical DRL algorithms.

DRL Models

A3C

A2C

DDQN

REINFORCE

DDPG

PPO

Environments

Cart Pole

Flappy Bird

Remote Environment

With LSTM

Reference: