This is a C++ implementation of the Value Iteration algorithm for solving Markov Decision Processes (MDPs), specifically applied to a GridWorld environment.
cpp/
├── CMakeLists.txt
├── README.md
├── include/
│ ├── gridworld.h
│ └── value_iteration.h
├── src/
│ ├── gridworld.cpp
│ ├── value_iteration.cpp
│ └── main.cpp
└── build/
-
GridWorld Environment: A 3x4 grid world with:
- Goal state at (0,3) with reward +1.0
- Wall state at (1,1) that is impassable
- Trap state at (1,3) with reward -1.0
- Start state at (2,0)
-
Value Iteration Algorithm: Implements the classic value iteration algorithm with:
- Configurable discount factor (gamma)
- Convergence threshold
- Visualization of value function and policy
-
Policy Extraction: Extracts the optimal greedy policy from the converged value function
- CMake 3.10 or higher
- C++17 compatible compiler (GCC, Clang, or MSVC)
-
Navigate to the project directory:
cd ch04_cpp
-
Create and enter the build directory:
mkdir -p build cd build
-
Generate build files with CMake:
cmake ..
-
Compile the project:
make
After successful compilation, run the executable:
./value_iteration
Running Value Iteration with gamma = 0.9
Grid layout:
- Goal state: (0,3) with reward +1.0
- Wall state: (1,1) - impassable
- Trap state: (1,3) with reward -1.0
- Start state: (2,0)
Iteration 0:
Value Function:
+---------+---------+---------+---------+
| 0.000 | 0.000 | 0.000 | 0.000 |
+---------+---------+---------+---------+
| 0.000 | WALL | 0.000 | 0.000 |
+---------+---------+---------+---------+
| 0.000 | 0.000 | 0.000 | 0.000 |
+---------+---------+---------+---------+
Iteration 1:
Value Function:
+---------+---------+---------+---------+
| 0.000 | 0.000 | 1.000 | 0.000 |
+---------+---------+---------+---------+
| 0.000 | WALL | 0.000 | 1.000 |
+---------+---------+---------+---------+
| 0.000 | 0.000 | 0.000 | 0.000 |
+---------+---------+---------+---------+
Iteration 2:
Value Function:
+---------+---------+---------+---------+
| 0.000 | 0.900 | 1.000 | 0.000 |
+---------+---------+---------+---------+
| 0.000 | WALL | 0.900 | 1.000 |
+---------+---------+---------+---------+
| 0.000 | 0.000 | 0.000 | 0.000 |
+---------+---------+---------+---------+
Iteration 3:
Value Function:
+---------+---------+---------+---------+
| 0.810 | 0.900 | 1.000 | 0.000 |
+---------+---------+---------+---------+
| 0.000 | WALL | 0.900 | 1.000 |
+---------+---------+---------+---------+
| 0.000 | 0.000 | 0.810 | 0.000 |
+---------+---------+---------+---------+
Iteration 4:
Value Function:
+---------+---------+---------+---------+
| 0.810 | 0.900 | 1.000 | 0.000 |
+---------+---------+---------+---------+
| 0.729 | WALL | 0.900 | 1.000 |
+---------+---------+---------+---------+
| 0.000 | 0.729 | 0.810 | 0.729 |
+---------+---------+---------+---------+
Iteration 5:
Value Function:
+---------+---------+---------+---------+
| 0.810 | 0.900 | 1.000 | 0.000 |
+---------+---------+---------+---------+
| 0.729 | WALL | 0.900 | 1.000 |
+---------+---------+---------+---------+
| 0.656 | 0.729 | 0.810 | 0.729 |
+---------+---------+---------+---------+
Converged after 6 iterations with delta = 0.000
Extracting greedy policy...
Final Value Function and Policy:
Value Function:
+---------+---------+---------+---------+
| 0.810 | 0.900 | 1.000 | 0.000 |
+---------+---------+---------+---------+
| 0.729 | WALL | 0.900 | 1.000 |
+---------+---------+---------+---------+
| 0.656 | 0.729 | 0.810 | 0.729 |
+---------+---------+---------+---------+
Policy (arrows show best action):
+---------+---------+---------+---------+
| → | → | → | GOAL |
+---------+---------+---------+---------+
| ↑ | WALL | ↑ | ↑ |
+---------+---------+---------+---------+
| ↑ | → | ↑ | ← |
+---------+---------+---------+---------+