gdao-research/A2C

An Implementation of Synchronous Advantage Actor Critic in Tensorflow

Python

Advantage Actor Critic (A2C)

An implementation of A2C (a variant of A3C) from OpenAI blog post.

Intuition:

Multiple workers work on different copies of an environment to collect a batch of data $\rightarrow$ No need for replay buffer.
Noise is added to logits of policy to ensure exploration.
Perform 1 gradient update step based on the data batch.

Environment

Python 3.6.5
TensorFlow 1.12
OpenAI Gym 0.10.5
OpenCV 4.0.0
mpi4py 3.0.0

* Note: All of the environment modification were taken from OpenAI baseline repository.