Value Iteration in C++

This is a C++ implementation of the Value Iteration algorithm for solving Markov Decision Processes (MDPs), specifically applied to a GridWorld environment.

Project Structure

cpp/
├── CMakeLists.txt         
├── README.md              
├── include/               
│   ├── gridworld.h        
│   └── value_iteration.h  
├── src/                   
│   ├── gridworld.cpp      
│   ├── value_iteration.cpp
│   └── main.cpp           
└── build/

Features

GridWorld Environment: A 3x4 grid world with:
- Goal state at (0,3) with reward +1.0
- Wall state at (1,1) that is impassable
- Trap state at (1,3) with reward -1.0
- Start state at (2,0)
Value Iteration Algorithm: Implements the classic value iteration algorithm with:
- Configurable discount factor (gamma)
- Convergence threshold
- Visualization of value function and policy
Policy Extraction: Extracts the optimal greedy policy from the converged value function

Build Instructions

Prerequisites

CMake 3.10 or higher
C++17 compatible compiler (GCC, Clang, or MSVC)

Building the Project

Navigate to the project directory:
```
cd ch04_cpp
```
Create and enter the build directory:
```
mkdir -p build
cd build
```
Generate build files with CMake:
```
cmake ..
```
Compile the project:
```
make
```

Running the Program

After successful compilation, run the executable:

./value_iteration

Output

Running Value Iteration with gamma = 0.9
Grid layout:
- Goal state: (0,3) with reward +1.0
- Wall state: (1,1) - impassable
- Trap state: (1,3) with reward -1.0
- Start state: (2,0)

Iteration 0:

Value Function:
+---------+---------+---------+---------+
|   0.000 |   0.000 |   0.000 |   0.000 |
+---------+---------+---------+---------+
|   0.000 |  WALL   |   0.000 |   0.000 |
+---------+---------+---------+---------+
|   0.000 |   0.000 |   0.000 |   0.000 |
+---------+---------+---------+---------+

Iteration 1:

Value Function:
+---------+---------+---------+---------+
|   0.000 |   0.000 |   1.000 |   0.000 |
+---------+---------+---------+---------+
|   0.000 |  WALL   |   0.000 |   1.000 |
+---------+---------+---------+---------+
|   0.000 |   0.000 |   0.000 |   0.000 |
+---------+---------+---------+---------+

Iteration 2:

Value Function:
+---------+---------+---------+---------+
|   0.000 |   0.900 |   1.000 |   0.000 |
+---------+---------+---------+---------+
|   0.000 |  WALL   |   0.900 |   1.000 |
+---------+---------+---------+---------+
|   0.000 |   0.000 |   0.000 |   0.000 |
+---------+---------+---------+---------+

Iteration 3:

Value Function:
+---------+---------+---------+---------+
|   0.810 |   0.900 |   1.000 |   0.000 |
+---------+---------+---------+---------+
|   0.000 |  WALL   |   0.900 |   1.000 |
+---------+---------+---------+---------+
|   0.000 |   0.000 |   0.810 |   0.000 |
+---------+---------+---------+---------+

Iteration 4:

Value Function:
+---------+---------+---------+---------+
|   0.810 |   0.900 |   1.000 |   0.000 |
+---------+---------+---------+---------+
|   0.729 |  WALL   |   0.900 |   1.000 |
+---------+---------+---------+---------+
|   0.000 |   0.729 |   0.810 |   0.729 |
+---------+---------+---------+---------+

Iteration 5:

Value Function:
+---------+---------+---------+---------+
|   0.810 |   0.900 |   1.000 |   0.000 |
+---------+---------+---------+---------+
|   0.729 |  WALL   |   0.900 |   1.000 |
+---------+---------+---------+---------+
|   0.656 |   0.729 |   0.810 |   0.729 |
+---------+---------+---------+---------+

Converged after 6 iterations with delta = 0.000

Extracting greedy policy...

Final Value Function and Policy:

Value Function:
+---------+---------+---------+---------+
|   0.810 |   0.900 |   1.000 |   0.000 |
+---------+---------+---------+---------+
|   0.729 |  WALL   |   0.900 |   1.000 |
+---------+---------+---------+---------+
|   0.656 |   0.729 |   0.810 |   0.729 |
+---------+---------+---------+---------+

Policy (arrows show best action):
+---------+---------+---------+---------+
|   →    |   →    |   →    |  GOAL   |
+---------+---------+---------+---------+
|   ↑    |  WALL   |   ↑    |   ↑    |
+---------+---------+---------+---------+
|   ↑    |   →    |   ↑    |   ←    |
+---------+---------+---------+---------+