Soft Q-Learning is a deep reinforcement learning framework for training expressive, energy-based policies in continuous domains. This implementation is based on rllab. Full algorithm is detailed in our paper, Reinforcement Learning with Deep Energy-Based Policies, and videos can be found here.
The implementation is compatible with the rllab interface (see documentation), and depends on some of its features which are included in this package for convenience. Additionally, some of the examples uses MuJoCo physics engine. For installation, you might find rllab documentation useful. You should add the MuJoCo library files and the key in /vendor/mujoco
folder.
You will need Tensorflow 1.0 or later. Full list of dependencies is listed in requirements.txt
.
There are three example environments:
- In the
MultiGoal
environment, task is to move a point-mass into one of four equally good goal locations (see details in our paper). - In the
Swimmer
environment, a two-dimensional, three-link snake needs to learn to swim forwards and backwards.
To train these models run
python softqlearning/scripts/learn_<env>.py
and to test a trained model, run
python softqlearning/scripts/sim_policy.py data/<env>/itr_<#>.pkl
where <env>
is the name of an environment and <#>
is a iteration number.
The Soft Q-Learning package was developed by Haoran Tang and Tuomas Haarnoja, under the supervision of Pieter Abbeel and Sergey Levine, in 2017 at UC Berkeley. We thank Vitchyr Pong and Shane Gu, who helped us implementing some parts of the code. The work was supported by Berkeley Deep Drive.
@article{haarnoja2017reinforcement,
title={Reinforcement Learning with Deep Energy-Based Policies},
author={Haarnoja, Tuomas and Tang, Haoran and Abbeel, Pieter and Levine, Sergey},
booktitle={International Conference on Machine Learning},
year={2017}
}