markkho/msdm

Making it easier to ensure convergence

Closed this issue · 6 comments

cgc commented

I sometimes bump into edges of the library when an algorithm doesn't have enough iterations to ensure convergence. Since our default values for # of iterations are relatively low (100 for LAO* and 50 for VI), I think we should attempt to address this to ensure it's hard to misuse the library. Can think of two strategies for solving this in the library:

  • When we can count the total number of states (TabularMDP), we should initialize the number of iterations appropriately. For example, for deterministic policies in value iteration, we should initialize the number of iterations to (approx) the total number of states.
  • Another approach is to track convergence and report that as part of the results object; for example, LRTDP can report something like converged=all(res.solved.values()), VI can report the largest bellman residual.

When we can count the total number of states (TabularMDP), we should initialize the number of iterations appropriately. For example, for deterministic policies in value iteration, we should initialize the number of iterations to (approx) the total number of states.
Another approach is to track convergence and report that as part of the results object; for example, LRTDP can report something like converged=all(res.solved.values()), VI can report the largest bellman residual.

These both sound great to implement. Another possibility is to have a convergence threshold parameter for VI and then print a warning if the final bellman residual isn't below the threshold.

cgc commented

I think my counter to something like a warning is it's difficult to programmatically enforce; this comes up for me when I'm doing some sort of meta-optimization and can't inspect solutions by hand but want to ensure they're converged / correct.

whoops right - yea so is the idea that it should throw an error if it doesn't converge?

cgc commented

Ok great - I've added a convergence flag here bc7b97b alongside the warning.

Did the same for LAO* and LRTDP (508dd59)