This is the codebase for Latent Embeddings for Abstracted Planning (LEAP), from the following paper:
Planning with Goal Conditioned Policies
Soroush Nasiriany*, Vitchyr Pong*, Steven Lin, Sergey Levine
Neural Information Processing Systems 2019
Arxiv | Website
This guide contains information about (1) Installation, (2) Experiments, and (3) Setting up Your Own Environments.
- multiworld (contains environments):
git clone -b leap https://github.com/vitchyr/multiworld
- doodad (for launching experiments):
git clone -b leap https://github.com/vitchyr/doodad
- follow instructions to setup repo
- viskit (for plotting experiments):
git clone -b leap https://github.com/vitchyr/viskit
- follow instructions to setup repo
- Current codebase:
git clone https://github.com/snasiriany/leap
- install dependencies:
pip install -r requirements.txt
- install dependencies:
export PYTHONPATH=$PYTHONPATH:/path/to/multiworld/repo
export PYTHONPATH=$PYTHONPATH:/path/to/doodad/repo
export PYTHONPATH=$PYTHONPATH:/path/to/viskit/repo
export PYTHONPATH=$PYTHONPATH:/path/to/leap/repo
You will need to install docker to run experiments. We have provided a dockerfile with all relevant packages. You will use this dockerfile to build your own docker image.
Before setting up the docker image, you will need to obtain a MuJoCo license to run experiments with the MuJoCo simulator. Obtain the license file mjkey.txt
and save it for reference.
Set up the docker image with the following steps:
cd docker
<add mjkey.txt to current directory>
docker build -t <your-dockerhub-uname>/leap .
docker login --username=<your-dockerhub-uname> --email=<your-email>
docker push <your-dockerhub-uname>/leap
You must setup the config file for launching experiments, providing paths to your code and data directories.
Inside railrl/config/launcher_config.py
, fill in the appropriate paths. You can use railrl/config/launcher_config_template.py
as an example reference.
All experiment files are located in experiments
. Each file conforms to the following structure:
variant = dict(
# defualt hyperparam settings for all envs
)
env_params = {
'<env1>' : {
# add/override default hyperparam settings for specific env
# each setting is specified as a dictionary address (key),
# followed by list of possible options (value).
# Example in following line:
# 'rl_variant.algo_kwargs.tdm_kwargs.max_tau': [10, 25, 100],
},
'<env2>' : {
...
},
}
You will need to follow four sequential stages to train and evaluate LEAP:
python vae/generate_vae_dataset.py --env <env-name>
Train the VAE. There are two variants, image based (for pm and pnr) and state based (for ant):
python vae/train_vae.py --env <env-name>
python vae/train_vae_state.py --env <env-name>
Before running: locate the corresponding .npy
file from the previous stage. The .npy
file contains the VAE dataset. Place the path in your config settings for your env inside the script:
'vae_variant.generate_vae_dataset_kwargs.dataset_path': ['your-npy-path-here'],
Train the RL model. There are two variants (as described in previous stage):
python image/train_tdm.py --env <env-name>
python state/train_tdm_state.py --env <env-name>
Before running: locate the trained VAE model from the previous stage. Place the path in your config settings for your env inside the script. Complete one of the following options:
'rl_variant.vae_base_path': ['your-base-path-here'], # folder of vaes
'rl_variant.vae_path': ['your-path-here'], # one vae
Test the RL model. There are two variants (as described in previous stage):
python image/test_tdm.py --env <env-name>
python state/test_tdm_state.py --env <env-name>
Before running: located the trained RL model from the previous stage. Place the path in your config settings for your env inside the script. Complete one of the following options:
'rl_variant.ckpt_base_path': ['your-base-path-here'], # folder of RL models
'rl_variant.ckpt': ['your-path-here'], # one RL model
See the parse_args
function in railrl/misc/exp_util.py
for the complete list of options. Some important options:
env
: the env to run (ant, pnr, pm)label
: name for experimentnum_seeds
: number of seeds to rundebug
: run with light options for debugging
During training, the results will be saved to a file called under
LOCAL_LOG_DIR/<env>/<exp_prefix>/<foldername>
LOCAL_LOG_DIR
is the directory set byrailrl.config.launcher_config.LOCAL_LOG_DIR
<exp_prefix>
is given either tosetup_logger
.<foldername>
is auto-generated and based off ofexp_prefix
.- inside this folder, you should see a file called
progress.csv
.
Inside the viskit codebase, run:
python viskit/frontend.py LOCAL_LOG_DIR/<env>/<exp_prefix>/
If visualizing VAE results, add --dname='vae_progress.csv'
as an option.
You will need to follow the multiworld template for creating your own environments. You will need to register your environment. For Mujoco envs for example, follow the examples in multiworld/envs/mujoco/__init__.py
for reference.
Much of the coding infrastructure is based on RLkit, which itself is based on rllab.