README last updated on: 02/19/2018
Reinforcement learning framework and algorithms implemented in PyTorch.
Some implemented algorithms:
- Temporal Difference Models (TDMs)
- Deep Deterministic Policy Gradient (DDPG)
- (Double) Deep Q-Network (DQN)
- Soft Actor Critic (SAC)
- Twin Dueling Deep Determinstic Policy Gradient (TD3)
To get started, checkout the example scripts, linked above.
Install and use the included ananconda environment
$ conda env create -f docker/rlkit/rlkit-env.yml
$ source activate rlkit-env
(rlkit-env) $ # Ready to run examples/ddpg_cheetah_no_doodad.py
Or if you want you can use the docker image included.
During training, the results will be saved to a file called under
LOCAL_LOG_DIR/<exp_prefix>/<foldername>
LOCAL_LOG_DIR
is the directory set byrlkit.launchers.config.LOCAL_LOG_DIR
<exp_prefix>
is given either tosetup_logger
.<foldername>
is auto-generated and based off ofexp_prefix
.- inside this folder, you should see a file called
params.pkl
. To visualize a policy, run
(rlkit-env) $ python scripts/sim_policy LOCAL_LOG_DIR/<exp_prefix>/<foldername>/params.pkl
If you have rllab installed, you can also visualize the results
using rllab
's viskit, described at
the bottom of this page
tl;dr run
python rllab/viskit/frontend.py LOCAL_LOG_DIR/<exp_prefix>/
The SAC implementation provided here only uses Gaussian policy, rather than a Gaussian mixture model, as described in the original SAC paper.
A lot of the coding infrastructure is based on rllab. The serialization and logger code are basically a carbon copy of the rllab versions.