/cp-mdp

A CANDECOMP-PARAFAC tensor decomposition method to solve a Markov Decision Process (MDP) gridworld problem.

Primary LanguagePython

A Tensor-based Markov Decision Process Solver (CP-MDP)

CP-MDP is an implementation based on the MICAI 2020 paper:

Daniela Kuinchtner, Felipe Meneguzzi and Afonso Sales.
A Tensor-Based Markov Decision Process Representation
In Advances in Soft Computing (pp. 313-324).

This implementation relies on modifying the pymdptoolbox toolkit, which uses a tabular method to represent MDP transition models and to compute the solution, to a tensor decomposition value iteration and policy iteration compact algorihtms (see mdp.py).

The compact value iteration (CP-MDP-VI), and the compact policy iteration (CP-MDP-PI), are based on the CANDECOMP/PARAFAC tensor decomposition definition (which gives the name "CP-MDP" for our solver) to build the transition models as tensor components in a compact representation.

Using GridWorld problems with n-dimensional sizes, we analyse runtime, memory, and computational cost of our method against tabular ones. We detail the results in the paper CP-MDP: A CANDECOMP-PARAFAC Decomposition Approach to Solve a Markov Decision Process Multidimensional Problem on arXiv.