Tools for solving an MDP using decomposition. The two main contributions are
- An implementation of the Attend, Adapt and Transfer (A2T) network for Q learning: https://arxiv.org/abs/1510.02879
- An implementation of Monte-Carlo Policy evaluation
- For A2T, construct an
A2TNetwork
by defining a base network, an attention network, and list of functions that compute estimates to the Q values (either from previous solutions or sub problems) - For Monte-Carlo Policy evaluation, see
examples/failure_estimation.jl
to see how to compute the probability of failure using this approach.
Maintained by Anthony Corso (acorso@stanford.edu)