A3C

This project is my attempt at implementing the Asynchronous Methods for Deep Reinforcement Learning (A3C) Paper.

Currently the PyTorch version is functional, and a TensorFlow version is being worked on.

Environment

The environment the model is trained on is SpaceInvaders-v0 from OpenAI's gym library. In this environment, the input received is a 210x160 RGB screenshot and the output is an integer reward as shown on screen and a boolean for if the game is done. Here is an example of a full game cycle.

Asynchronous Design

Python multiprocessing (which forks and executes) subprocesses were used rather than threading in this project due to the CPU bound nature of the functions, which renders multithreading almost as inefficient as single threading due to Python GIL contention.

Training

The original paper model parameters, preprocessing sequence and training parameters were replicated as best as it could be discerned and training is currently in progress. Currently, the best is:

JasonTang99/A3C

A3C

Environment

Asynchronous Design

Training