LLT1/distributed_reinforcement_learning

implementation of distributed reinforcement learning with distributed tensorflow

Python

Implementation of Distributed Reinforcement Learning with Tensorflow

Information

20 actors with 1 learner.
Tensorflow implementation with distributed tensorflow of server-client architecture.
Recurrent Experience Replay in Distributed Reinforcement Learning is implemented in CartPole-v0 environment with POMDP(only position state)

Dependency

opencv-python
gym[atari]
tensorboardX
tensorflow==1.14.0

Implementation

How to Run

Ape-x: DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY

python train_apex.py --job_name learner --task 0

CUDA_VISIBLE_DEVICES=-1 python train_apex.py --job_name actor --task 0
CUDA_VISIBLE_DEVICES=-1 python train_apex.py --job_name actor --task 1
CUDA_VISIBLE_DEVICES=-1 python train_apex.py --job_name actor --task 2
...
CUDA_VISIBLE_DEVICES=-1 python train_apex.py --job_name actor --task 19

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

python train_impala.py --job_name learner --task 0

CUDA_VISIBLE_DEVICES=-1 python train_impala.py --job_name actor --task 0
CUDA_VISIBLE_DEVICES=-1 python train_impala.py --job_name actor --task 1
CUDA_VISIBLE_DEVICES=-1 python train_impala.py --job_name actor --task 2
...
CUDA_VISIBLE_DEVICES=-1 python train_impala.py --job_name actor --task 19

R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning

python train_r2d2.py --job_name learner --task 0

CUDA_VISIBLE_DEVICES=-1 python train_r2d2.py --job_name actor --task 0
CUDA_VISIBLE_DEVICES=-1 python train_r2d2.py --job_name actor --task 1
CUDA_VISIBLE_DEVICES=-1 python train_r2d2.py --job_name actor --task 2

Reference