/4x3_world_AI_Textbook

a MDP example that has a 4x3 grid, in AI TextBook.

Primary LanguagePython

4x3_world_AI_Textbook

This code is to calculate the utilities at each state for "4x3 world" problem using Value Iteration.
The more detailed information is at the page 645 to 653 in AI textbook.
Download Link : https://www.ics.uci.edu/~rickl/courses/cs-171/aima-resources/Artificial%20Intelligence%20A%20Modern%20Approach%20(3rd%20Edition).pdf

Visualization of 4x3 world problem

3 [__, __, __, +1]
2 [__, wa, __, -1]
1 [sp, __, __, __]
    1   2   3   4
  • Description
    sp : a start point (1, 1)
    wa : a wall which is blocked
    +1 : reward +1
    -1 : reward -1
    __ : a space which the agent can stay

    coordinate order: left, up i.e. horizon first, verticality second
    e.g., +1 => (4, 3) -1 => (4, 2)

  • Action
    An agent can act to go one step.
    The cases are "Up", "Right", "Down", "Left"

  • Noisy
    An agent do action to go one step, and it has noisies,
    Go straight with probability 0.8 P(gs) = 0.8 That is an intention to go forward
    Left unintentionally with probability 0.1 P(l) = 0.1 An noisy to go left unintentionally
    Right unintentionally with probability 0.1 P(r) = 0.1 Ao noisy to go right unintentionally

  • Reward
    Except the states (4,3) and (4,2) (those are +1, -1), R(s) = -0.04

  • Optimal Policy
    An Optimal policy for the R(s) = -0.04 is

    3 [Ri, Ri, Ri,  1]
    2 [Up, xx, Up, -1]
    1 [Up, Le, Le, Le]
        1   2   3   4
    

    Ri : Right, Le : Left

Value Iteration Pseudo code

refer to the page 653 in AI textbook.

Value Iteration Results

Iteration to converge : 33

[0.811  0.868  0.918   1.0]
[0.761  _____  0.66   -1.0]
[0.705  0.655  0.611 0.387]