Replication code for "Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies" by Kallus and Uehara

To replicate

Run seq 3| xargs -L 1 -P 3 ./script.sh