Making it easier to ensure convergence

Question

Making it easier to ensure convergence

Closed this issue 3 years ago · 6 comments

I sometimes bump into edges of the library when an algorithm doesn't have enough iterations to ensure convergence. Since our default values for # of iterations are relatively low (100 for LAO* and 50 for VI), I think we should attempt to address this to ensure it's hard to misuse the library. Can think of two strategies for solving this in the library:

When we can count the total number of states (TabularMDP), we should initialize the number of iterations appropriately. For example, for deterministic policies in value iteration, we should initialize the number of iterations to (approx) the total number of states.
Another approach is to track convergence and report that as part of the results object; for example, LRTDP can report something like converged=all(res.solved.values()), VI can report the largest bellman residual.

Answer 1 · 2021-03-22T15:47:33.000Z

When we can count the total number of states (TabularMDP), we should initialize the number of iterations appropriately. For example, for deterministic policies in value iteration, we should initialize the number of iterations to (approx) the total number of states.
Another approach is to track convergence and report that as part of the results object; for example, LRTDP can report something like converged=all(res.solved.values()), VI can report the largest bellman residual.

These both sound great to implement. Another possibility is to have a convergence threshold parameter for VI and then print a warning if the final bellman residual isn't below the threshold.

Answer 2 · 2021-03-30T15:17:41.000Z

I think my counter to something like a warning is it's difficult to programmatically enforce; this comes up for me when I'm doing some sort of meta-optimization and can't inspect solutions by hand but want to ensure they're converged / correct.

Answer 3 · 2021-03-30T15:59:46.000Z

whoops right - yea so is the idea that it should throw an error if it doesn't converge?

Answer 4 · 2021-03-30T16:33:25.000Z

I think a middle road is to encourage Results objects to have information about convergence. Permits authors to flexibly decide how to handle the issue. I still think it’s important for libraries to detect convergence and use that for early termination; the ultimate goals that motivate this issue for me are 1) greater efficiency from detected convergence from inside algorithms & 2) making it possible to ensure convergence occurs from outside algorithms.

…

On Tue, Mar 30, 2021 at 12:00 PM markkho ***@***.***> wrote: whoops right - yea so is the idea that it should throw an error if it doesn't converge? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#15 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJXWOE6POHRUY46LENUTD3TGHYQHANCNFSM4ZTLP6DQ> .

Answer 5 · 2021-03-30T16:51:23.000Z

Ok great - I've added a convergence flag here bc7b97b alongside the warning.

Answer 6 · 2021-03-30T17:01:49.000Z

Did the same for LAO* and LRTDP (508dd59)