This repository contains the code for:
Regularization Matters for Policy Optimization - An Empirical Study on Continuous Control [arXiv]. Also in ICLR 2021 and NeurIPS 2019 Deep RL Workshop.
Zhuang Liu*, Xuanlin Li*, Bingyi Kang* and Trevor Darrell (* equal contribution)
Our code is adopted from OpenAI Baselines and SAC.
Deep Reinforcement Learning (Deep RL) has been receiving increasingly more attention thanks to its encouraging performance on a variety of control tasks. Yet, conventional regularization techniques in training neural networks (e.g.,
Git clone https://github.com/rll/rllab to PATH_TO_RLLAB_FOLDER
Install MuJoCo (but don't install mujoco_py
yet) by following instructions on https://github.com/openai/mujoco-py.
Copy additional_lib_for_rllab/libglfw.so.3, additional_lib_for_rllab/libmujoco131.so
in this repository and mjkey.txt
from the mujoco key path to a new folder named PATH_TO_RLLAB_FOLDER/vendor/mujoco
.
Fix a typo in rllab by
vi PATH_TO_RLLAB_FOLDER/rllab/sampler/stateful_pool.py
Change
"from joblib.pool import MemmapingPool"
to
"from joblib.pool import MemmappingPool"
Set up virtual environment using
virtualenv ENV_NAME --python=python3
source ENV_NAME/bin/activate
Install mujoco_py
for MuJoCo (version 2.0)
by following the instructions on https://github.com/openai/mujoco-py
Next, modify .bashrc
(or set up a shell script named SOMESCRIPT.sh
and source SOMESCRIPT.sh
before training):
export PYTHONPATH=PATH_TO_THIS_REPO/baselines_release:$PYTHONPATH
export PYTHONPATH=PATH_TO_RLLAB_FOLDER:$PYTHONPATH
export PYTHONPATH=PATH_TO_THIS_REPO/sac_release:$PYTHONPATH
Next, install the required packages. Openai baseline also requires that CUDA>=9.0.
pip3 install tensorflow-gpu==(VERSION_THAT_COMPLIES_WITH_CUDA_INSTALLATION, note that tensorflow 2.0.0 is not compatible with this repo)
pip3 install mpi4py roboschool==1.0.48 gym==0.13.0 click dill joblib opencv-python progressbar2 tqdm theano path.py cached_property python-dateutil pyopengl mako gtimer matplotlib pyprind
pip3 install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip
To train, run
cd PATH_TO_THIS_REPO
python -m baselines.run --help
python PATH_TO_REPO/sac_release/examples/mujoco_all_sac.py --help
for the available arguments, such as the number of environments simulated in parallel, model save path, etc.
For Soft Actor Critic, PATH_TO_THIS_REPO/sac_release/examples.variants.py
contains default environment settings. These settings are overwritten by command line arguments.
l1regpi, l1regvf = L1 policy/value network regularization
l2regpi, l2regvf = L2 policy/value network regularization
wclippi, wclipvf = Policy/value network weight clipping
(Note: for openai baseline policy weight clipping, we only clip the mlp part of
the network because clipping the log standard deviation vector almost always
harms the performance)
dropoutpi, dropoutvf = Policy/value network dropout KEEP_PROB (1.0 = no dropout)
batchnormpi, batchnormvf = Policy/value network batch normalization (True or False)
ent_coef = Entropy regularization coefficient
python -m baselines.run --alg=ppo2 --env=RoboschoolHumanoid-v1 --num_timesteps=5e7 --l2regpi=0.0001
Runs ppo2
(Proximal Policy Gradient) on RoboschoolHumanoid
task with 5e7
timesteps with L2 regularization applied to the policy network with strength=0.0001.
python -m baselines.run --alg=a2c --env=Humanoid-v2 --num_timesteps=2e7 --ent_coef=0.0 --batchnormpi=True
Runs a2c
(Synchronous version of A3C) on Humanoid (MuJoCo)
task with 2e7
timesteps with batch normalization applied to the policy network and the entropy regularization turned off.
python sac_release/examples/mujoco_all_sac.py --env=atlas-forward-walk-roboschool --dropoutpi=0.9
Runs sac
(Soft Actor Critic) on RoboschoolAtlasForwardWalk
task with dropout probability = 1 - 0.9 = 0.1 on policy network (i.e. keep probability = 0.9).
(Note that the number of training timesteps is predefined in sac_release/examples/variant.py
)
@inproceedings{liu2020regularization,
title={Regularization Matters in Policy Optimization-An Empirical Study on Continuous Control},
author={Liu, Zhuang and Li, Xuanlin and Kang, Bingyi and Darrell, Trevor},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=yr1mzrH3IC}
}