Making it easier to ensure convergence
Closed this issue · 6 comments
I sometimes bump into edges of the library when an algorithm doesn't have enough iterations to ensure convergence. Since our default values for # of iterations are relatively low (100 for LAO* and 50 for VI), I think we should attempt to address this to ensure it's hard to misuse the library. Can think of two strategies for solving this in the library:
- When we can count the total number of states (TabularMDP), we should initialize the number of iterations appropriately. For example, for deterministic policies in value iteration, we should initialize the number of iterations to (approx) the total number of states.
- Another approach is to track convergence and report that as part of the results object; for example, LRTDP can report something like
converged=all(res.solved.values())
, VI can report the largest bellman residual.
When we can count the total number of states (TabularMDP), we should initialize the number of iterations appropriately. For example, for deterministic policies in value iteration, we should initialize the number of iterations to (approx) the total number of states.
Another approach is to track convergence and report that as part of the results object; for example, LRTDP can report something like converged=all(res.solved.values()), VI can report the largest bellman residual.
These both sound great to implement. Another possibility is to have a convergence threshold parameter for VI and then print a warning if the final bellman residual isn't below the threshold.
I think my counter to something like a warning is it's difficult to programmatically enforce; this comes up for me when I'm doing some sort of meta-optimization and can't inspect solutions by hand but want to ensure they're converged / correct.
whoops right - yea so is the idea that it should throw an error if it doesn't converge?