/psrl

Code for PSRL algorithm

Primary LanguagePython

Posterior Sampling for Reinforcement Learning

Replica of results from the paper that introduces Posterior Sampling for Reinforcement Learning (PSRL) algorithm.

Osband, I., Russo, D., & Van Roy, B. (2013). (More) efficient reinforcement learning via posterior sampling. Advances in Neural Information Processing Systems, 26.

The current codebase supports the following RL environemnts:

TwoRoom and FourRoom gridworld environments

Installation

  1. Create conda environment
cd psrl/
conda create --name psrl python=3.9
conda activate psrl
  1. Install requirements
pip install -r requirements.txt
pip install -e .

Running experiments

To replicate all plots first run the optimization process for each agent and environment

python scripts/generate_data.py --config configs/riverswim_psrl.yaml --seed 0

This script will produce files agent.pkl and trajectories.pkl which store the trained parameters of the optimized agent and the trajectories taken in the environment throughout the execution of the program. Choose between any of the configuration files in config folder to generate data specific for each experiment.

The most straightforward way to obtain all data necessary for plots is to just run the following script

. run_parallel.sh

which launches all combinations of environments (riverswim, tworoom, fourroom), agents (psrl, ucrl, kl_ucrl), and seeds (10 in total, starting at 0) using screen.

After all runs come to an end, you can obtain regret plots by running

python scripts/plot_regret.py --config configs/regret_riverswim.yaml

Switch between the following configs to obtain a regret plot for each environment:

  • configs/regret_riverswim.yaml
  • configs/regret_tworoom.yaml
  • configs/regret_fourroom.yaml

With configs/regret_riverswim.yaml you should expect the following plot

Regret for Exploration Algorithms

Likewise, with a single run you can obtain agent-specific plots for gridworld environments by running

python scripts/plot_agent.py --config configs/tworoom_klucrl.yaml

Choose the right configuration to obtain a set of plots for any particular run. You should obtain all the following plots:

  • Action-value function
  • Empirical state visitation
  • Empirical total reward
  • Expected reward
  • Policy
  • State-value function

For configs/tworoom_klucrl.yaml (after setting no_goal=False) you should expect the following

  • Action-value function Action-value function
  • Empirical state visitations Empirical state visitations
  • Empirical total reward Empirical total reward
  • Expected reward Expected reward
  • Policy Policy
  • State-value function State-value function

For configs/fourroom_klucrl.yaml (after setting no_goal=False) you should expect the following

  • Action-value function Action-value function
  • Empirical state visitations Empirical state visitations
  • Empirical total reward Empirical total reward
  • Expected reward Expected reward
  • Policy Policy
  • State-value function State-value function

Disclaimer

This project includes multiple other scripts that are undocumented. These were meant for a research project that was left unfinished, so they do not directly connect to the original paper. Likewise, there is no guarantee that results obtained from them produce any meaningful output yet.