Expert-Supervised Reinforcement Learning (ESRL)

Code for Expert-Supervised Reinforcement Learning (ESRL). If you use our code please cite our Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation paper.

Repo is set up for Riverswim environment and will work for any episodic, discrete state and action space environment.

Overview

Running main.py will

Train an expert behavior policy function with PSRL if it's not already present
Generate a training dataset using epsilon-greedy behavior policy
Train an ESRL policy
Evaluate the policy online
Use offline policy evaluation using step-importance sampling (IS), step-weighted importance sampling (WIS) and model-based ESRL to obtain reward estimates

Results are saved in a dictionary or added into the existing results dictionary.

Function:

To begin the process with defaults run:

python main.py

The following argument options are available:

python main.py --seed 0 --episodes 300 --risk_aversion .1 --epsilon .1 --MDP_samples_train 250 --MDP_samples_eval 500

see the ESRL paper or our ESRL video for the details on the arguments and method.

Bibtex

@article{ASW2020expertsupervised,
    title={Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation},
    author={Aaron Sonabend and Junwei Lu and Leo A. Celi and Tianxi Cai and Peter Szolovits},
    year={2020},
    eprint={2006.13189},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

RudrakshTuwani/ESRL

Expert-Supervised Reinforcement Learning (ESRL)

Overview

Function:

Bibtex