/Value-Iteration

Visualizations of Reinforcement Learning concepts including Value Iteration and Q-Learning

Primary LanguageJupyter Notebook

State Value Iteration For FrozenLake8x8-v0

State Value Iteration assumes that the MDP is known and then in sweeps it will update the value for each state given the possible: actions, future states, future rewards, and state transition probabilities. The variable gamma will affect how much long term rewards are considered when updating state values.

Here are some images showing state values after convergence, I modified the value of the final state [8,8] to be the max of all value states so that the graph looks nicer. All terminal states have zero value.

See notebook for full code.

Gamma 1

Gamma 0.999

Gamma 0.95