Decomposing Markov Decision Processes: A Novel Solution Method

This archive is distributed under the MIT License.

The source code and data in this repository are a snapshot of the software and data that were used in the research reported on in the manuscript under review Decomposing Markov Decision Processes: A Novel Solution Method by Z. Liu et. al. The data generated for this study are included with the codes.

Cite

To cite this repository, please cite the manuscript.

Below is the BibTex for citing the manuscript.

@article{Liu2021,
  title={Decomposing Markov Decision Processes: A Novel Solution Method},
  author={Liu, Zeyu and Li, Xueping and Khojandi, Anahita},
  journal={Preprint},
  year={2021},
  url={https://www.researchgate.net/publication/353295872_A_Benders_Decomposition_Method_for_Markov_Decision_Processes}
}

Description

The goal of this repository is to solve Markov decision processes (MDP) with the Benders decomposition. Our proposed algorithm, the MDP-Benders algorithm, decomposes the linear programming formulation of MDP into a master problem and multiple subproblems with respect to the states of MDP. Please refer to the manuscript for further details.

The data used in this study contains a randomly generated MDP problem and five other problems from the literature:

A queueing problem [1];
A bandit machine problem [2];
An inventory management problem [3];
A machine maintenance problem [4];
A data transmission problem [5].

Python Prerequisite

The following Python libraries are required to run the source codes:

numpy;
scipy;
pickle;
gurobipy;
mdptoolbox.

Usage

General MDP

Run decomposition() function in the main.py file in the scripts/constrained_MDP folder. Different problems can be set up using the provided variables instance and params. All six problems are available. The following describes the parameters for each problem:

random: params = ( number of states, number of actions );
queue: params = ( length of queue, number of service modes, arrival probability );
bandit: params = ( number of arms, number of states for each arm );
inventory: params = ( inventory capacity );
replace: params = ( number of states of the machine, number of maintenance options );
transmission params = ( number of channels, number of data packages, number of transmission modes ).

MDP with Monotone Optimal Policy

Run decomposition_monotone() function in the main.py file in the scripts/constrained_MDP folder. Different problems can be set up using the provided variables instance and params. The queue, maintain and transmit problems are available.

Constrained MDP

Run the main.py file in the scripts/constrained_MDP folder. Different problems can be set up using the provided variables instance and params. All six problems are available.

Support

For support in using this software, submit an issue.

Reference

[1] de Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Operations research 51(6):850-865.

[2] Bertsimas D, Misic VV (2016) Decomposable markov decision processes: A fluid optimization approach. Operations Research 64(6):1537-1555.

[3] Lee I, Epelman MA, Romeijn HE, Smith RL (2017) Simplex algorithm for countable-state discounted markov decision processes. Operations Research 65(4):1029-1042.

[4] Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming (John Wiley & Sons).

[5] Krishnamurthy V (2016) Partially observed Markov decision processes (Cambridge university press).

ILABUTK/Benders_Decomposition_for_MDP