Assignment from David Silver's Reinforcement Learning course. Coded for clarity, not efficiency.
Requires Torch7 with the Moses package.
Run monte-carlo.lua
first to generate Q* and the plot of V (below), then sarsa-lambda.lua
and lin-fun-approx.lua
to generate their plots.
Includes an additional method without value functions - policy-gradient.lua
- that uses a simple neural network.