MDP Planning

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Adding a new algorithm
Contributing
License
Contact
Acknowledgements

About The Project

Implemented different algorithms to compute an optimal policy for a given MDP.

Using the algorithms we have built a maze solver which finds the shortest path in the given maze.

Algorithms

The algorithms implemented are:

Value Iteration
Howard's Policy Iteration
Linear Programming

Built With

Code is in python. It uses the following libraries:

NumPy
PuLP

Getting Started

Prerequisites

NumPy
```
$ pip3 install numpy
```
PuLP
```
$ pip3 install pulp
```

Installation

Clone the repo

$ git clone https://github.com/rishiagarwal2000/mdp-planning.git

Usage

To use MDP planning algorithms:

$ python3 planner.py --mdp MDP --algorithm ALGORITHM

Here is an example:

$ python3 planner.py --mdp ../data/mdp/continuing-mdp-10-5.txt --algorithm vi

Refer Algorithms to see the list of implemented algorithms.

To solve a maze:

Encode the maze into an MDP

$ python3 encoder.py --grid GRID > MDPFILE

Here is an example:

$ python3 encoder.py --grid ../data/maze/grid10.txt > mdpFile

Solve the MDP using one of the mentioned algorithms

$ python3 planner.py --mdp mdpFile --algorithm vi > VALUE_AND_POLICY_FILE

Here is an example:

$ python3 planner.py --mdp mdpFile --algorithm vi > value_and_policy_file

Decode the output policy to print the optimal path

$ python3 decoder.py --grid GRID --value_policy VALUE_AND_POLICY_FILE

Here is an example:

$ python3 decoder.py --grid ../data/maze/grid10.txt --value_policy value_and_policy_file

MDP File Format

Each MDP is provided as a text file in the following format.

numStates S
numActions A
start st
end ed1 ed2 ... edn
transition s1 ac s2 r p
transition s1 ac s2 r p
. . .
. . .
. . .
transition s1 ac s2 r p
mdptype mdptype
discount gamma
Maze File Format

Each maze is provided in a text file as a rectangular grid of 0's, 1's, 2, and 3's. Here 0 denotes an empty tile, 1 denotes an obstruction/wall, 2 denotes the start state and 3 denotes an end state.

Acknowledgments: The file formats and test datasets were provided as a part of an assignment in the course CS747.

Adding a new algorithm

Add your algorithm in the following section:

def plan(f, algo):
	S,A,T,R,gamma,st,end = readMDP(f)
	#print('{}, {}, {}, {}, {}, {}, {}'.format(S,A,T,R,gamma,st,end))
	v,p = [],[]
	if algo == 'vi':
		v,p = vi(S,A,T,R,gamma)
	elif algo == 'hpi':
		v, p = hpi(S,A,T,R,gamma)
	elif algo == 'lp':
		v, p = lp(S,A,T,R,gamma)	
	else:
		print("Algo not found")
		return
	
	for value,action in zip(v,p):
		print("{:.6f} {}".format(value, action))

	return

I/O specification:

Inputs of the algorithm:
- S is the number of states in the MDP
- A is the number of actions in the action space
- T is a 3-d array of transition probabilities
- R is a 3-d array of rewards
- gamma is the discount factor
The algorithm should output a tuple of optimal value function and optimal policy

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Rishi Agarwal - rishiapril2000@gmail.com

Project Link: https://github.com/rishiagarwal2000/mdp-planning.git

Acknowledgements

Professor Shivaram Kalyanakrishnan, IIT Bombay
Santhosh Kumar G., IIT Bombay