MDP Planning
Table of Contents
About The Project
Implemented different algorithms to compute an optimal policy for a given MDP.
Using the algorithms we have built a maze solver which finds the shortest path in the given maze.
Algorithms
The algorithms implemented are:
- Value Iteration
- Howard's Policy Iteration
- Linear Programming
Built With
Code is in python. It uses the following libraries:
Getting Started
Prerequisites
-
NumPy
$ pip3 install numpy
-
PuLP
$ pip3 install pulp
Installation
-
Clone the repo
$ git clone https://github.com/rishiagarwal2000/mdp-planning.git
Usage
-
To use MDP planning algorithms:
$ python3 planner.py --mdp MDP --algorithm ALGORITHM
Here is an example:
$ python3 planner.py --mdp ../data/mdp/continuing-mdp-10-5.txt --algorithm vi
Refer Algorithms to see the list of implemented algorithms.
-
To solve a maze:
- Encode the maze into an MDP
$ python3 encoder.py --grid GRID > MDPFILE
Here is an example:
$ python3 encoder.py --grid ../data/maze/grid10.txt > mdpFile
- Solve the MDP using one of the mentioned algorithms
$ python3 planner.py --mdp mdpFile --algorithm vi > VALUE_AND_POLICY_FILE
Here is an example:
$ python3 planner.py --mdp mdpFile --algorithm vi > value_and_policy_file
- Decode the output policy to print the optimal path
$ python3 decoder.py --grid GRID --value_policy VALUE_AND_POLICY_FILE
Here is an example:
$ python3 decoder.py --grid ../data/maze/grid10.txt --value_policy value_and_policy_file
-
MDP File Format
Each MDP is provided as a text file in the following format.
numStates S
numActions A
start st
end ed1 ed2 ... edn
transition s1 ac s2 r p
transition s1 ac s2 r p
. . .
. . .
. . .
transition s1 ac s2 r p
mdptype mdptype
discount gamma -
Maze File Format
Each maze is provided in a text file as a rectangular grid of 0's, 1's, 2, and 3's. Here 0 denotes an empty tile, 1 denotes an obstruction/wall, 2 denotes the start state and 3 denotes an end state.
Acknowledgments: The file formats and test datasets were provided as a part of an assignment in the course CS747.
Adding a new algorithm
Add your algorithm in the following section:
def plan(f, algo):
S,A,T,R,gamma,st,end = readMDP(f)
#print('{}, {}, {}, {}, {}, {}, {}'.format(S,A,T,R,gamma,st,end))
v,p = [],[]
if algo == 'vi':
v,p = vi(S,A,T,R,gamma)
elif algo == 'hpi':
v, p = hpi(S,A,T,R,gamma)
elif algo == 'lp':
v, p = lp(S,A,T,R,gamma)
else:
print("Algo not found")
return
for value,action in zip(v,p):
print("{:.6f} {}".format(value, action))
return
I/O specification:
-
Inputs of the algorithm:
- S is the number of states in the MDP
- A is the number of actions in the action space
- T is a 3-d array of transition probabilities
- R is a 3-d array of rewards
- gamma is the discount factor
-
The algorithm should output a tuple of optimal value function and optimal policy
Contributing
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE
for more information.
Contact
Rishi Agarwal - rishiapril2000@gmail.com
Project Link: https://github.com/rishiagarwal2000/mdp-planning.git
Acknowledgements
- Professor Shivaram Kalyanakrishnan, IIT Bombay
- Santhosh Kumar G., IIT Bombay