Maze-Solver

Instructions for running the code

To get familiar with the MDP file format, you can view and run generateMDP.py, which is a python script used to generate random MDPs. Specify the number of states and actions, the discount factor, type of mdp (episodic or continuing), and the random seed as command-line arguments to this file. Two examples of how this script can be invoked are given below.

python generateMDP.py --S 2 --A 2 --gamma 0.90 --mdptype episodic --rseed 0
python generateMDP.py --S 50 --A 20 --gamma 0.20 --mdptype continuing --rseed 0

Run any of the following commands to execute the planner.py file and obtain the value function for specified algorithm:

python planner.py --mdp /home/user/data/mdp-4.txt --algorithm vi
python planner.py --mdp /home/user/temp/data/mdp-7.txt --algorithm hpi
python planner.py --mdp /home/user/mdpfiles/mdp-5.txt --algorithm lp

Run the following command to visualise the maze: python visualize.py gridfile

Run the following command to visualise the solution of the maze solver: python visualize.py gridfile pathfile

Run the following commands to encode the maze as an MDP, run the planner on the MDP and finally decode the path using the optimal policy obtained.

python encoder.py --grid gridfile > mdpfile
python planner.py --mdp mdpfile --algorithm vi > value_and_policy_file
python decoder.py --grid gridfile --value_policy value_and_policy_file

python PlannerVerifyOutput.py command can be used to tests all three algorithms on the all the MDP instances given in the data/mdp directory.
python PlannerVerifyOutput.py --mdp --algorithm vi command will test only the value iteration algorithm on the all the MDP instances given in the data/mdp directory.
python MazeVerifyOutput.py --mdp --algorithm vi will test all the maze instances that are given in the data/maze directory. Here, --algorithm specifies the algorithm that you would like to run on the MDP generated by encoder.py

report.pdf contains the design decisions, assumptions, and observations about the algorithms. It also mentions formulation of the MDP for the maze problem.

Value Iteration

The vi directory contains the images of the shortest paths for the grids given in the maze folder in the data directory.

It also contains the encoded mdp of the maze as well as the optimal policy generated by the value iteration algorithm for that mdp to find the shortest path between the start and end points. It also contains the output of the path simulated by the optimal policy decoded using decoder.py

Howard's Policy Iteration

The hpi directory contains the images of the shortest paths for the grids given in the maze folder in the data directory.

It also contains the encoded mdp of the maze as well as the optimal policy generated by the howard's policy iteration algorithm for that mdp to find the shortest path between the start and end points. It also contains the output of the path simulated by the optimal policy decoded using decoder.py

Linear Programming

The lp directory contains the images of the shortest paths for the grids given in the maze folder in the data directory.

It also contains the encoded mdp of the maze as well as the optimal policy generated by the linear programming algorithm for that mdp to find the shortest path between the start and end points. It also contains the output of the path simulated by the optimal policy decoded using decoder.py

Results

  1. It was observed that value iteration was the fastest for solving the mazes for shortest paths followed by Howard’s policy Iteration whereas Linear Programming was the slowest.
  2. Value iteration and Howard’s policy iteration gave different shortest paths for the mazes having multiple shortest paths.
  3. For mdps with less than 100 states, linear programming takes the shortest time followed by value iteration whereas howard’s policy iteration takes the longest time to solve the MDP.

Here yellow grid cell denotes the start state and white cell denotes the end state. Note that the maze can have multiple end states and mdp is encoded in such a way that only the shortest path among all the shortest paths to all the end states is printed.

shortest_path shortest_path shortest_path shortest_path shortest_path shortest_path shortest_path shortest_path shortest_path shortest_path

References

  1. https://www.cse.iitb.ac.in/~shivaram/teaching/cs747-a2020/pa-2/programming-assignment-2.html
  2. https://coin-or.github.io/pulp/
  3. https://www.geeksforgeeks.org/python-linear-programming-in-pulp/