/mdp-planning

Primary LanguagePythonMIT LicenseMIT

MDP Planning

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Adding a new algorithm
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Implemented different algorithms to compute an optimal policy for a given MDP.

Using the algorithms we have built a maze solver which finds the shortest path in the given maze.

Algorithms

The algorithms implemented are:

  • Value Iteration
  • Howard's Policy Iteration
  • Linear Programming

Built With

Code is in python. It uses the following libraries:

Getting Started

Prerequisites

  • NumPy

    $ pip3 install numpy
  • PuLP

    $ pip3 install pulp

Installation

  1. Clone the repo

    $ git clone https://github.com/rishiagarwal2000/mdp-planning.git 

Usage

  • To use MDP planning algorithms:

    $ python3 planner.py --mdp MDP --algorithm ALGORITHM 

    Here is an example:

    $ python3 planner.py --mdp ../data/mdp/continuing-mdp-10-5.txt --algorithm vi

    Refer Algorithms to see the list of implemented algorithms.

  • To solve a maze:

    1. Encode the maze into an MDP
    $ python3 encoder.py --grid GRID > MDPFILE

    Here is an example:

    $ python3 encoder.py --grid ../data/maze/grid10.txt > mdpFile
    1. Solve the MDP using one of the mentioned algorithms
    $ python3 planner.py --mdp mdpFile --algorithm vi > VALUE_AND_POLICY_FILE

    Here is an example:

    $ python3 planner.py --mdp mdpFile --algorithm vi > value_and_policy_file
    1. Decode the output policy to print the optimal path
    $ python3 decoder.py --grid GRID --value_policy VALUE_AND_POLICY_FILE

    Here is an example:

    $ python3 decoder.py --grid ../data/maze/grid10.txt --value_policy value_and_policy_file
  • MDP File Format

    Each MDP is provided as a text file in the following format.

    numStates S
    numActions A
    start st
    end ed1 ed2 ... edn
    transition s1 ac s2 r p
    transition s1 ac s2 r p
    . . .
    . . .
    . . .
    transition s1 ac s2 r p
    mdptype mdptype
    discount gamma

  • Maze File Format

    Each maze is provided in a text file as a rectangular grid of 0's, 1's, 2, and 3's. Here 0 denotes an empty tile, 1 denotes an obstruction/wall, 2 denotes the start state and 3 denotes an end state.

Acknowledgments: The file formats and test datasets were provided as a part of an assignment in the course CS747.

Adding a new algorithm

Add your algorithm in the following section:

def plan(f, algo):
	S,A,T,R,gamma,st,end = readMDP(f)
	#print('{}, {}, {}, {}, {}, {}, {}'.format(S,A,T,R,gamma,st,end))
	v,p = [],[]
	if algo == 'vi':
		v,p = vi(S,A,T,R,gamma)
	elif algo == 'hpi':
		v, p = hpi(S,A,T,R,gamma)
	elif algo == 'lp':
		v, p = lp(S,A,T,R,gamma)	
	else:
		print("Algo not found")
		return
	
	for value,action in zip(v,p):
		print("{:.6f} {}".format(value, action))

	return

I/O specification:

  • Inputs of the algorithm:

    • S is the number of states in the MDP
    • A is the number of actions in the action space
    • T is a 3-d array of transition probabilities
    • R is a 3-d array of rewards
    • gamma is the discount factor
  • The algorithm should output a tuple of optimal value function and optimal policy

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Rishi Agarwal - rishiapril2000@gmail.com

Project Link: https://github.com/rishiagarwal2000/mdp-planning.git

Acknowledgements