/A2C

An Implementation of Synchronous Advantage Actor Critic in Tensorflow

Primary LanguagePython

Advantage Actor Critic (A2C)

An implementation of A2C (a variant of A3C) from OpenAI blog post.

Intuition:

  • Multiple workers work on different copies of an environment to collect a batch of data $\rightarrow$ No need for replay buffer.
  • Noise is added to logits of policy to ensure exploration.
  • Perform 1 gradient update step based on the data batch.

Environment

  • Python 3.6.5
  • TensorFlow 1.12
  • OpenAI Gym 0.10.5
  • OpenCV 4.0.0
  • mpi4py 3.0.0

* Note: All of the environment modification were taken from OpenAI baseline repository.