This repository is the official implementation of OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation (presented at ICML 2021).
- Insert the following commands in
~/.bashrc
.export MUJOCO_PY_MJKEY_PATH=$HOME/.mujoco/mjkey.txt export MUJOCO_PY_MJPRO_PATH=$HOME/.mujoco/mjpro150/ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mjpro150/bin
- Download MuJoCo. Save
mjkey.txt
to$HOME/.mujoco
after the following commands:mkdir -p $HOME/.mujoco cd $HOME/.mujoco wget https://www.roboti.us/download/mjpro150_linux.zip unzip mjpro150_linux.zip rm mjpro150_linux.zip wget https://www.roboti.us/download/mujoco200_linux.zip unzip mujoco200_linux.zip rm mujoco200_linux.zip
-
Create conda environment and activate it:
conda env create -f environment.yml conda activate optidice
-
Install
d4rl
:pip install git+https://github.com/rail-berkeley/d4rl@master#egg=d4rl
-
Random MDP experiments:
python finite_run.py
-
D4RL Benchmarks
python neural_dice_rl.py \ --env_name=maze2d-umaze-v1 \ --policy_extraction=iproj \ --e_loss_type=mse \ --alpha=0.001
If you use this code, please cite our paper:
@inproceedings{lee2021optidice,
author = {Jongmin Lee and Wonseok Jeon and Byung-Jun Lee and Joelle Pineau and Kee-Eung Kim},
title = {OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation},
booktitle = {Proceedings of the 38th International Conference on Machine Learning (ICML)},
year = {2021}
}