README last updated on: 02/19/2018

rlkit

Reinforcement learning framework and algorithms implemented in PyTorch.

Some implemented algorithms:

Temporal Difference Models (TDMs)
Deep Deterministic Policy Gradient (DDPG)
- example script
- DDPG paper
(Double) Deep Q-Network (DQN)
Soft Actor Critic (SAC)
Twin Dueling Deep Determinstic Policy Gradient (TD3)
- example script
- TD3 paper

To get started, checkout the example scripts, linked above.

Installation

Install and use the included ananconda environment

$ conda env create -f docker/rlkit/rlkit-env.yml
$ source activate rlkit-env
(rlkit-env) $ # Ready to run examples/ddpg_cheetah_no_doodad.py

Or if you want you can use the docker image included.

During training, the results will be saved to a file called under

LOCAL_LOG_DIR/<exp_prefix>/<foldername>

LOCAL_LOG_DIR is the directory set by rlkit.launchers.config.LOCAL_LOG_DIR
<exp_prefix> is given either to setup_logger.
<foldername> is auto-generated and based off of exp_prefix.
inside this folder, you should see a file called params.pkl. To visualize a policy, run

(rlkit-env) $ python scripts/sim_policy LOCAL_LOG_DIR/<exp_prefix>/<foldername>/params.pkl

If you have rllab installed, you can also visualize the results using rllab's viskit, described at the bottom of this page

tl;dr run

python rllab/viskit/frontend.py LOCAL_LOG_DIR/<exp_prefix>/

The SAC implementation provided here only uses Gaussian policy, rather than a Gaussian mixture model, as described in the original SAC paper.

A lot of the coding infrastructure is based on rllab. The serialization and logger code are basically a carbon copy of the rllab versions.