/rlpd

Primary LanguagePythonMIT LicenseMIT

Reinforcement Learning with Prior Data (RLPD)

alt text

This is code to accompany the paper "Efficient Online Reinforcement Learning with Offline Data", available here. This code can be readily adapted to work on any offline dataset.

Installation

conda create -n rlpd python=3.9 # If you use conda.
conda activate rlpd
conda install patchelf  # If you use conda.
pip install -r requirements.txt
conda deactivate
conda activate rlpd

Experiments

D4RL Locomotion

XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning.py --env_name=halfcheetah-expert-v0 \
                --utd_ratio=20 \
                --start_training 5000 \
                --max_steps 250000 \
                --config=configs/rlpd_config.py \
                --project_name=rlpd_locomotion

D4RL Antmaze

XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning.py --env_name=antmaze-umaze-v2 \
                --utd_ratio=20 \
                --start_training 5000 \
                --max_steps 300000 \
                --config=configs/rlpd_config.py \
                --config.backup_entropy=False \
                --config.hidden_dims="(256, 256, 256)" \
                --config.num_min_qs=1 \
                --project_name=rlpd_antmaze

Adroit Binary

First, download and unzip .npy files into ~/.datasets/awac-data/ from here.

Make sure you have mjrl installed:

git clone https://github.com/aravindr93/mjrl
cd mjrl
pip install -e .

Then, recursively clone mj_envs from this fork:

git clone --recursive https://github.com/philipjball/mj_envs.git

Then sync the submodules (add the --init flag if you didn't recursively clone):

$ cd mj_envs  
$ git submodule update --remote

Finally:

$ pip install -e .

Now you can run the following in this directory

XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning.py --env_name=pen-binary-v0 \
                --utd_ratio=20 \
                --start_training 5000 \
                --max_steps 1000000 \
                --config=configs/rlpd_config.py \
                --config.backup_entropy=False \
                --config.hidden_dims="(256, 256, 256)" \
                --project_name=rlpd_adroit

V-D4RL

These are pixel-based datasets for offline RL (paper here).

Download the 64px Main V-D4RL datsets into ~/.vd4rl here or here.

For instance, the Medium Cheetah Run .npz files should be in ~/.vd4rl/main/cheetah_run/medium/64px.

XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning_pixels.py --env_name=cheetah-run-v0 \
                --start_training 5000 \
                --max_steps 300000 \
                --config=configs/rlpd_pixels_config.py \
                --project_name=rlpd_vd4rl