Code for Expert-Supervised Reinforcement Learning (ESRL). If you use our code please cite our Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation paper.
Repo is set up for Riverswim environment and will work for any episodic, discrete state and action space environment.
Running main.py
will
- Train an expert behavior policy function with PSRL if it's not already present
- Generate a training dataset using epsilon-greedy behavior policy
- Train an ESRL policy
- Evaluate the policy online
- Use offline policy evaluation using step-importance sampling (IS), step-weighted importance sampling (WIS) and model-based ESRL to obtain reward estimates
Results are saved in a dictionary or added into the existing results dictionary.
To begin the process with defaults run:
python main.py
The following argument options are available:
python main.py --seed 0 --episodes 300 --risk_aversion .1 --epsilon .1 --MDP_samples_train 250 --MDP_samples_eval 500
see the ESRL paper or our ESRL video for the details on the arguments and method.
@inproceedings{ASW2020expertsupervised,
author = {Sonabend, Aaron and Lu, Junwei and Celi, Leo Anthony and Cai, Tianxi and Szolovits, Peter},
booktitle = {Advances in Neural Information Processing Systems},
pages = {18967--18977},
title = {Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation},
url = {https://proceedings.neurips.cc/paper/2020/file/daf642455364613e2120c636b5a1f9c7-Paper.pdf},
volume = {33},
year = {2020}
}