Policy iteration does not work
Closed this issue · 3 comments
GoogleCodeExporter commented
What steps will reproduce the problem?
1. Attempt to use the policy iteration algorithm
What is the expected output? What do you see instead?
Policy iteration should iterate several times before converging to a
solution. Instead, it converges after exactly one iteration.
What version of the product are you using? On what operating system?
The version posted on http://aima.cs.berkeley.edu/python/mdp.html, using
Python 2.6
Please provide any additional information below.
I've attached a fixed version of the file. The only line that changes
is 139:
U[s] = R(s) + gamma * sum([p * U[s1] for (p, s1) in T(s, pi[s])])
Original issue reported on code.google.com by srbur...@gmail.com
on 29 Apr 2010 at 5:54
Attachments:
GoogleCodeExporter commented
[deleted comment]
GoogleCodeExporter commented
Fixed in r30.
Original comment by wit...@gmail.com
on 15 Sep 2011 at 4:19
GoogleCodeExporter commented
Original comment by wit...@gmail.com
on 15 Sep 2011 at 4:20
- Changed state: Fixed