/value_iter

Value Iteration in C++

Primary LanguageC++

Value Iteration in C++

This is a C++ implementation of the Value Iteration algorithm for solving Markov Decision Processes (MDPs), specifically applied to a GridWorld environment.

Project Structure

cpp/
├── CMakeLists.txt         
├── README.md              
├── include/               
│   ├── gridworld.h        
│   └── value_iteration.h  
├── src/                   
│   ├── gridworld.cpp      
│   ├── value_iteration.cpp
│   └── main.cpp           
└── build/                 

Features

  • GridWorld Environment: A 3x4 grid world with:

    • Goal state at (0,3) with reward +1.0
    • Wall state at (1,1) that is impassable
    • Trap state at (1,3) with reward -1.0
    • Start state at (2,0)
  • Value Iteration Algorithm: Implements the classic value iteration algorithm with:

    • Configurable discount factor (gamma)
    • Convergence threshold
    • Visualization of value function and policy
  • Policy Extraction: Extracts the optimal greedy policy from the converged value function

Build Instructions

Prerequisites

  • CMake 3.10 or higher
  • C++17 compatible compiler (GCC, Clang, or MSVC)

Building the Project

  1. Navigate to the project directory:

    cd ch04_cpp
  2. Create and enter the build directory:

    mkdir -p build
    cd build
  3. Generate build files with CMake:

    cmake ..
  4. Compile the project:

    make

Running the Program

After successful compilation, run the executable:

./value_iteration

Output

Running Value Iteration with gamma = 0.9
Grid layout:
- Goal state: (0,3) with reward +1.0
- Wall state: (1,1) - impassable
- Trap state: (1,3) with reward -1.0
- Start state: (2,0)

Iteration 0:

Value Function:
+---------+---------+---------+---------+
|   0.000 |   0.000 |   0.000 |   0.000 |
+---------+---------+---------+---------+
|   0.000 |  WALL   |   0.000 |   0.000 |
+---------+---------+---------+---------+
|   0.000 |   0.000 |   0.000 |   0.000 |
+---------+---------+---------+---------+

Iteration 1:

Value Function:
+---------+---------+---------+---------+
|   0.000 |   0.000 |   1.000 |   0.000 |
+---------+---------+---------+---------+
|   0.000 |  WALL   |   0.000 |   1.000 |
+---------+---------+---------+---------+
|   0.000 |   0.000 |   0.000 |   0.000 |
+---------+---------+---------+---------+

Iteration 2:

Value Function:
+---------+---------+---------+---------+
|   0.000 |   0.900 |   1.000 |   0.000 |
+---------+---------+---------+---------+
|   0.000 |  WALL   |   0.900 |   1.000 |
+---------+---------+---------+---------+
|   0.000 |   0.000 |   0.000 |   0.000 |
+---------+---------+---------+---------+

Iteration 3:

Value Function:
+---------+---------+---------+---------+
|   0.810 |   0.900 |   1.000 |   0.000 |
+---------+---------+---------+---------+
|   0.000 |  WALL   |   0.900 |   1.000 |
+---------+---------+---------+---------+
|   0.000 |   0.000 |   0.810 |   0.000 |
+---------+---------+---------+---------+

Iteration 4:

Value Function:
+---------+---------+---------+---------+
|   0.810 |   0.900 |   1.000 |   0.000 |
+---------+---------+---------+---------+
|   0.729 |  WALL   |   0.900 |   1.000 |
+---------+---------+---------+---------+
|   0.000 |   0.729 |   0.810 |   0.729 |
+---------+---------+---------+---------+

Iteration 5:

Value Function:
+---------+---------+---------+---------+
|   0.810 |   0.900 |   1.000 |   0.000 |
+---------+---------+---------+---------+
|   0.729 |  WALL   |   0.900 |   1.000 |
+---------+---------+---------+---------+
|   0.656 |   0.729 |   0.810 |   0.729 |
+---------+---------+---------+---------+

Converged after 6 iterations with delta = 0.000

Extracting greedy policy...

Final Value Function and Policy:

Value Function:
+---------+---------+---------+---------+
|   0.810 |   0.900 |   1.000 |   0.000 |
+---------+---------+---------+---------+
|   0.729 |  WALL   |   0.900 |   1.000 |
+---------+---------+---------+---------+
|   0.656 |   0.729 |   0.810 |   0.729 |
+---------+---------+---------+---------+

Policy (arrows show best action):
+---------+---------+---------+---------+
|   →    |   →    |   →    |  GOAL   |
+---------+---------+---------+---------+
|   ↑    |  WALL   |   ↑    |   ↑    |
+---------+---------+---------+---------+
|   ↑    |   →    |   ↑    |   ←    |
+---------+---------+---------+---------+