4x3_world_AI_Textbook
This code is to calculate the utilities at each state for "4x3 world" problem using Value Iteration.
The more detailed information is at the page 645 to 653 in AI textbook.
Download Link : https://www.ics.uci.edu/~rickl/courses/cs-171/aima-resources/Artificial%20Intelligence%20A%20Modern%20Approach%20(3rd%20Edition).pdf
Visualization of 4x3 world problem
3 [__, __, __, +1]
2 [__, wa, __, -1]
1 [sp, __, __, __]
1 2 3 4
-
Description
sp : a start point (1, 1)
wa : a wall which is blocked
+1 : reward +1
-1 : reward -1
__ : a space which the agent can staycoordinate order: left, up i.e. horizon first, verticality second
e.g., +1 => (4, 3) -1 => (4, 2) -
Action
An agent can act to go one step.
The cases are "Up", "Right", "Down", "Left" -
Noisy
An agent do action to go one step, and it has noisies,
Go straight with probability 0.8 P(gs) = 0.8 That is an intention to go forward
Left unintentionally with probability 0.1 P(l) = 0.1 An noisy to go left unintentionally
Right unintentionally with probability 0.1 P(r) = 0.1 Ao noisy to go right unintentionally -
Reward
Except the states (4,3) and (4,2) (those are +1, -1), R(s) = -0.04 -
Optimal Policy
An Optimal policy for the R(s) = -0.04 is3 [Ri, Ri, Ri, 1] 2 [Up, xx, Up, -1] 1 [Up, Le, Le, Le] 1 2 3 4
Ri : Right, Le : Left
Value Iteration Pseudo code
refer to the page 653 in AI textbook.
Value Iteration Results
Iteration to converge : 33
[0.811 0.868 0.918 1.0]
[0.761 _____ 0.66 -1.0]
[0.705 0.655 0.611 0.387]