/leap

Official codebase for LEAP: Planning with Goal Conditioned Policies

Primary LanguagePythonMIT LicenseMIT

LEAP

This is the codebase for Latent Embeddings for Abstracted Planning (LEAP), from the following paper:

Planning with Goal Conditioned Policies
Soroush Nasiriany*, Vitchyr Pong*, Steven Lin, Sergey Levine
Neural Information Processing Systems 2019
Arxiv | Website

This guide contains information about (1) Installation, (2) Experiments, and (3) Setting up Your Own Environments.

Installation

Download Code

  • multiworld (contains environments):git clone -b leap https://github.com/vitchyr/multiworld
  • doodad (for launching experiments):git clone -b leap https://github.com/vitchyr/doodad
  • viskit (for plotting experiments):git clone -b leap https://github.com/vitchyr/viskit
  • Current codebase: git clone https://github.com/snasiriany/leap
    • install dependencies: pip install -r requirements.txt

Add paths

export PYTHONPATH=$PYTHONPATH:/path/to/multiworld/repo
export PYTHONPATH=$PYTHONPATH:/path/to/doodad/repo
export PYTHONPATH=$PYTHONPATH:/path/to/viskit/repo
export PYTHONPATH=$PYTHONPATH:/path/to/leap/repo

Setup Docker Image

You will need to install docker to run experiments. We have provided a dockerfile with all relevant packages. You will use this dockerfile to build your own docker image.

Before setting up the docker image, you will need to obtain a MuJoCo license to run experiments with the MuJoCo simulator. Obtain the license file mjkey.txt and save it for reference.

Set up the docker image with the following steps:

cd docker
<add mjkey.txt to current directory>
docker build -t <your-dockerhub-uname>/leap .
docker login --username=<your-dockerhub-uname> --email=<your-email>
docker push <your-dockerhub-uname>/leap

Setup Config File

You must setup the config file for launching experiments, providing paths to your code and data directories. Inside railrl/config/launcher_config.py, fill in the appropriate paths. You can use railrl/config/launcher_config_template.py as an example reference.

Experiments

All experiment files are located in experiments. Each file conforms to the following structure:

variant = dict(
  # defualt hyperparam settings for all envs
)

env_params = {
  '<env1>' : {
    # add/override default hyperparam settings for specific env
    # each setting is specified as a dictionary address (key),
    # followed by list of possible options (value).
    # Example in following line:
    # 'rl_variant.algo_kwargs.tdm_kwargs.max_tau': [10, 25, 100],
  },
  '<env2>' : {
    ...
  },
}

Running Experiments

You will need to follow four sequential stages to train and evaluate LEAP:

Stage 1: Generate VAE Dataset

python vae/generate_vae_dataset.py --env <env-name>

Stage 2: Train VAE

Train the VAE. There are two variants, image based (for pm and pnr) and state based (for ant):

python vae/train_vae.py --env <env-name>
python vae/train_vae_state.py --env <env-name>

Before running: locate the corresponding .npy file from the previous stage. The .npy file contains the VAE dataset. Place the path in your config settings for your env inside the script:

'vae_variant.generate_vae_dataset_kwargs.dataset_path': ['your-npy-path-here'],

Stage 3: Train RL

Train the RL model. There are two variants (as described in previous stage):

python image/train_tdm.py --env <env-name>
python state/train_tdm_state.py --env <env-name>

Before running: locate the trained VAE model from the previous stage. Place the path in your config settings for your env inside the script. Complete one of the following options:

'rl_variant.vae_base_path': ['your-base-path-here'], # folder of vaes
'rl_variant.vae_path': ['your-path-here'], # one vae

Stage 4: Test RL

Test the RL model. There are two variants (as described in previous stage):

python image/test_tdm.py --env <env-name>
python state/test_tdm_state.py --env <env-name>

Before running: located the trained RL model from the previous stage. Place the path in your config settings for your env inside the script. Complete one of the following options:

'rl_variant.ckpt_base_path': ['your-base-path-here'], # folder of RL models
'rl_variant.ckpt': ['your-path-here'], # one RL model

Experiment Options

See the parse_args function in railrl/misc/exp_util.py for the complete list of options. Some important options:

  • env: the env to run (ant, pnr, pm)
  • label: name for experiment
  • num_seeds: number of seeds to run
  • debug: run with light options for debugging

Plotting Experiment Results

During training, the results will be saved to a file called under

LOCAL_LOG_DIR/<env>/<exp_prefix>/<foldername>
  • LOCAL_LOG_DIR is the directory set by railrl.config.launcher_config.LOCAL_LOG_DIR
  • <exp_prefix> is given either to setup_logger.
  • <foldername> is auto-generated and based off of exp_prefix.
  • inside this folder, you should see a file called progress.csv.

Inside the viskit codebase, run:

python viskit/frontend.py LOCAL_LOG_DIR/<env>/<exp_prefix>/

If visualizing VAE results, add --dname='vae_progress.csv' as an option.

Setting up Your Own Environments

You will need to follow the multiworld template for creating your own environments. You will need to register your environment. For Mujoco envs for example, follow the examples in multiworld/envs/mujoco/__init__.py for reference.

Credit

Much of the coding infrastructure is based on RLkit, which itself is based on rllab.