/lspi

Least Squares Policy Iteration in Python

Primary LanguagePythonBSD 2-Clause "Simplified" LicenseBSD-2-Clause

Least Squares Policy Iteration in Python

Author: Jeremy Stober
Contact: stober@gmail.com
Version: 0.1

This is a Python implementation of LSPI from Lagoudakis and Parr in
JMLR (2003). The code depends on having an environment that provides a
function phi for generating features from state-action pairs, and a
function linear_policy for evaluating the policy. The gridworld
package (https://github.com/stober/gridworld) provides example
environments. Both lspi.py and lstdq.py contain example code using a
simple chainworld environment from the original paper (included in the
gridworld package).