Deeply-Debiased Off-Policy Interval Estimation (D2OPE)

This repository is the official implementation of the paper "Deeply-Debiased Off-Policy Interval Estimation" (ICML 2021) in Python.

Summary of the paper

Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments.

Method	Results

File overview

Code files in the main folder:
1. Methods:
  1. _TRIPLE.py: main function for the proposed method
  2. _IS.py: code to implement the two IS-based competing methods
2. Environment:
  1. _Ohio_Simulator.py: simulate for the Diabates environment
  2. _cartpole.py: simulate for the Cartpole environment, forked from OpenAI Gym, with slight modifications.
3. _util.py: helper functions
4. _analyze.py: post-process simulation results
/density: functions for estimating the two density ratio functions
/coinDice: code for the competing method "coinDice". Forked from https://github.com/google-research/dice_rl
/target_policies: checkpoints for the learned target policies
/RL: some useful RL functions
1. DQN.py and FQI.py: implementation of the target/behaviour policies
2. FQE.py: function for estimating the initial Q function
3. my_gym.py: helper functions for training
4. sampler.py: samplers and replay buffers
/TOY: code to generate the two plots for toy examples
1. TOY_coverage.ipynb: for the plot showing the CI coverage
2. TOY_TRIPLY.ipynb: for the plot showing the triply robust property
3. _plot.py: helper functions for plotting
4. _discrete.py: TR method for discrete state space
/script: scripts to run the experiments.

Reproduce simulation results

To reproduce our simulation experiment results, please follow the steps:

install the required packages
change the working directory to the main folder
open the jupyer notebook and modify the hyper-parameters
run and analyze the output results

RunzheStat/D2OPE

Deeply-Debiased Off-Policy Interval Estimation (D2OPE)

Summary of the paper

File overview

Reproduce simulation results