Inconsistent results
JorenCoulier opened this issue · 1 comments
The resulting near-optimal value function is different (much larger than the small convergence threshold) for each execution of the same problem.
When running some of the available tests, 'test_transition' in 'test_mdp.py' failed with one of following assertion errors:
AssertionError: 0.0 != 0.05 within 7 places (0.05 difference)
AssertionError: 0.9999999999999998 != 0.05 within 7 places (0.9499999999999997 difference)
AssertionError: 0.0 != 1.0 within 7 places (1.0 difference)
It seems to be completely random which of the assertion errors is thrown after each execution of the test.
The test in 'test_value_iteration.py' also fails and the resulting policy that causes the assertion error is also different between multiple executions of the test.