This is the codebase for Latent Embeddings for Abstracted Planning (LEAP), from the following paper:
Planning with Goal Conditioned Policies
Soroush Nasiriany*, Vitchyr Pong*, Steven Lin, Sergey Levine
Neural Information Processing Systems 2019
Arxiv | Website
This guide contains information about (1) Installation, (2) Experiments, and (3) Setting up Your Own Environments.
- multiworld (contains environments):
git clone -b leap https://github.com/vitchyr/multiworld - doodad (for launching experiments):
git clone -b leap https://github.com/vitchyr/doodad- follow instructions to setup repo
- viskit (for plotting experiments):
git clone -b leap https://github.com/vitchyr/viskit- follow instructions to setup repo
- Current codebase:
git clone https://github.com/snasiriany/leap- install dependencies:
pip install -r requirements.txt
- install dependencies:
export PYTHONPATH=$PYTHONPATH:/path/to/multiworld/repo
export PYTHONPATH=$PYTHONPATH:/path/to/doodad/repo
export PYTHONPATH=$PYTHONPATH:/path/to/viskit/repo
export PYTHONPATH=$PYTHONPATH:/path/to/leap/repo
You will need to install docker to run experiments. We have provided a dockerfile with all relevant packages. You will use this dockerfile to build your own docker image.
Before setting up the docker image, you will need to obtain a MuJoCo license to run experiments with the MuJoCo simulator. Obtain the license file mjkey.txt and save it for reference.
Set up the docker image with the following steps:
cd docker
<add mjkey.txt to current directory>
docker build -t <your-dockerhub-uname>/leap .
docker login --username=<your-dockerhub-uname> --email=<your-email>
docker push <your-dockerhub-uname>/leap
You must setup the config file for launching experiments, providing paths to your code and data directories.
Inside railrl/config/launcher_config.py, fill in the appropriate paths. You can use railrl/config/launcher_config_template.py as an example reference.
All experiment files are located in experiments. Each file conforms to the following structure:
variant = dict(
# defualt hyperparam settings for all envs
)
env_params = {
'<env1>' : {
# add/override default hyperparam settings for specific env
# each setting is specified as a dictionary address (key),
# followed by list of possible options (value).
# Example in following line:
# 'rl_variant.algo_kwargs.tdm_kwargs.max_tau': [10, 25, 100],
},
'<env2>' : {
...
},
}
You will need to follow four sequential stages to train and evaluate LEAP:
python vae/generate_vae_dataset.py --env <env-name>
Train the VAE. There are two variants, image based (for pm and pnr) and state based (for ant):
python vae/train_vae.py --env <env-name>
python vae/train_vae_state.py --env <env-name>
Before running: locate the corresponding .npy file from the previous stage. The .npy file contains the VAE dataset. Place the path in your config settings for your env inside the script:
'vae_variant.generate_vae_dataset_kwargs.dataset_path': ['your-npy-path-here'],
Train the RL model. There are two variants (as described in previous stage):
python image/train_tdm.py --env <env-name>
python state/train_tdm_state.py --env <env-name>
Before running: locate the trained VAE model from the previous stage. Place the path in your config settings for your env inside the script. Complete one of the following options:
'rl_variant.vae_base_path': ['your-base-path-here'], # folder of vaes
'rl_variant.vae_path': ['your-path-here'], # one vae
Test the RL model. There are two variants (as described in previous stage):
python image/test_tdm.py --env <env-name>
python state/test_tdm_state.py --env <env-name>
Before running: located the trained RL model from the previous stage. Place the path in your config settings for your env inside the script. Complete one of the following options:
'rl_variant.ckpt_base_path': ['your-base-path-here'], # folder of RL models
'rl_variant.ckpt': ['your-path-here'], # one RL model
See the parse_args function in railrl/misc/exp_util.py for the complete list of options. Some important options:
env: the env to run (ant, pnr, pm)label: name for experimentnum_seeds: number of seeds to rundebug: run with light options for debugging
During training, the results will be saved to a file called under
LOCAL_LOG_DIR/<env>/<exp_prefix>/<foldername>
LOCAL_LOG_DIRis the directory set byrailrl.config.launcher_config.LOCAL_LOG_DIR<exp_prefix>is given either tosetup_logger.<foldername>is auto-generated and based off ofexp_prefix.- inside this folder, you should see a file called
progress.csv.
Inside the viskit codebase, run:
python viskit/frontend.py LOCAL_LOG_DIR/<env>/<exp_prefix>/
If visualizing VAE results, add --dname='vae_progress.csv' as an option.
You will need to follow the multiworld template for creating your own environments. You will need to register your environment. For Mujoco envs for example, follow the examples in multiworld/envs/mujoco/__init__.py for reference.
Much of the coding infrastructure is based on RLkit, which itself is based on rllab.