/tars-rl

Distributed Reinforcement Learning Framework

Primary LanguageJupyter NotebookMIT LicenseMIT

TARS-RL

Distributed Reinforcement Learning Framework.

Algorithms

  • DDPG pdf
  • C51 (Categorical DDPG) pdf
  • QR-DQN (Quantile DDPG) pdf
  • Soft Actor-Critic (SAC) pdf
  • TD3 pdf
  • Quantile TD3
  • Ensemble of algorithms (use same batch for training)

Features

  • Client-Server architecture (you don't need incorporate RL framework into your environment, just use client)
  • Server collects experience and to training
  • Arbitrary number of parallel agents (clients) can send gathered experience to server over network
  • All hyperparameters in one file
  • Different exploration parameters for every agent
  • Easy to implement new algorithms
  • Support any gym compatible environment out of the box
  • Python 3.6

Example envs

OpenAI Gym:

  • Bipedal Walker both simple and hardcore see
  • Lunar Lander see
  • Pendulum see

Challanges:

  • NeurIPS 2017: Learning To Run see
  • NeurIPS 2018: AI for Prosthetics Challenge see

Documentation

See config file description

Installation

Step 0. Install anaconda with python 3.6 from download page or see archived versions (Optional, but highly recommended)

1. Clone repo
$ git clone

2. Add to PATH your anaconda
$ export PATH=/path/to/your/anaconda/bin/:$PATH

3. Install requirements
$ pip install tensorflow
or
$ pip install tensorflow-gpu
if you have supported by tensorflow GPU 

$ pip install tensorboardX

4. For OpenAI gym examples
$ pip install gym['box2d']

or see how to install all Gym envs
https://github.com/openai/gym

How to run

$ cd root/of/tars-rl

run server (Lunar Lander config as an example)
$ python -m rl_server.server.run_server --config experiments/lunar_lander/config_ddpg.yml

run agents
(7 parallel agents on your computer,
 supposed you have CPU with 8 threads)
$ CUDA_VISIBLE_DEVICES="" python -m rl_server.server.run_agents --config experiments/lunar_lander/config_ddpg.yml
 
CUDA_VISIBLE_DEVICES=""
is needed if you don't want agents
to not interrupt server train operations

run trained policy from a checkpoint without server
python -m rl_server.server.play --config path/to/config.yml --checkpoint path/to/model-10000.ckpt --seed 1234

Credits

References

  1. Continuous Control with Deep Reinforcement Learning (DDPG) (pdf).
  2. A Distributional Perspective on Reinforcement Learning (C51) (pdf).
  3. Distributional Reinforcement Learning with Quantile Regression (QR-DQN) (pdf).
  4. Soft Actor-Critic: Off-Policy Maximum Entropy Deep RL with a Stochastic Actor (SAC-GMM) (pdf).
  5. Addressing Function Approximation Error in Actor-Critic Methods (TD3) (pdf).
  6. Layer Normalization (pdf)
  7. Parameter Space Noise for Exploration (pdf)
  8. Noisy Networks for Exploration (pdf)

Roadmap

  1. Train envs, make videos, write docs
  2. Release TARS-RL
  3. Add HER
  4. Support pytorch
  5. Add self-play