A Tensorflow-based implementation of all algorithms presented in Asynchronous Methods for Deep Reinforcement Learning.
This implementation uses processes instead of threads to achieve real concurrency. Each process has a local replica of the network(s) used, implemented in Tensorflow, and runs its own Tensorflow session. In addition, a copy of the network parameters are kept in a shared memory space. At runtime, each process uses its own local network(s) to choose actions and compute gradients (with Tensorflow). The shared network parameters are updated periodically in an asynchronous manner, by applying the grads obtained from Tensorflow into the shared memory space.
Algorithms implemented:
- N-Step Q-Learning
- SARSA
- A3C
- A3C-LSTM (needs more testing)
On the horizon:
- Construct augmented rewards using pseudo-counts of CTS density model. From Unifying Count-Based Exploration and Intrinsic Motivation
Both ALE and Open AI GYM are supported for the environments.
(1) Go to the algorithms folder
(<some path to this repo>/async-deep-rl/algorithms
) and choose which
algorithm to run via the configuration options in main.py
.
(2) If you want to run the algorithms using Open AI GYM with 16 processes and visualize the games, e.g.:
$ python main.py BeamRider-v0 --env GYM -n 16 -v 1