kaskr/adcomp

improved error handling/documentation for inner optimization

bbolker opened this issue · 8 comments

In the inner optimization block within the ff() internal function, there is a try() statement around the Newton's method evaluation. If this fails (the condition is on the return value opt: !is.list(opt) || !if.finite(opt$value), as a side point I think inherits(opt, "try-error") would be a more precise test for the first part of the condition), then a single NaN value is returned. This leads to a rather obscure error about (e.g.) gradient returning a result of length 1, when length is expected (from optim or optimHess or nlminb, depending on exactly where the error happens). I would suggest that returning rep(NaN, length(par.fixed)) (I think that's the right dimension?) would be better - it would pass the NaN values up the chain to cause an error elsewhere, but still an improvement on the current behaviour.

A wish-list item would be that this condition would also throw some kind of informative warning (e.g. "inner Newton optimization failed, see ?MakeADFun (random effects section of 'Details' section) and ?newton")

However, from what I can gather so far the default newton settings are already at their slowest/most robust setting (smartsearch = TRUE, maxit = 100). As far as I can see (I may have missed something!) the alternative to inner.method = "newton" (use optim for the inner optimization — this actually seems to be a workable fallback for the case I was running into trouble with) doesn't seem to be documented anywhere?

Does it make sense, or is it "too clever", to attempt inner optimization with optim() as an automatic fallback option if the newton evaluation fails?

I would be happy to work on a pull request with one or more of these components if it seems warranted.

I apologize for not having a reproducible example - so far I have only seen this error in rather complex cases - but will see if I can dig one up. This error is listed as a possibility in the Errors section of the glmmTMB 'troubleshooting' vignette, but without an example ...

Updates:

  1. There is a reproducible example here.
  2. There is some information printed if silent=FALSE in MakeADFun() (but more warnings might still be nice).
kaskr commented

Thanks for the reproducible example. I admit the error message is misleading.

I would suggest that returning rep(NaN, length(par.fixed))

Your example demonstrates why this is (surprisingly) not a good idea. By returning a vector of the right length you'll make optimHess spend ca. 10 minutes computing a hessian of NANs rather than stopping right away. I claim that most TMB users would prefer the current behaviour.

A wish-list item would be that this condition would also throw some kind of informative warning

That should be doable, but I'm not sure all users like it. It's perfectly normal that the inner newton optimizer fails now and then even for well behaved models.

use optim for the inner optimization - this actually seems to be a workable fallback

This is a very unstable solution. Have you ever observed inner.method="BFGS" to work for any TMB model?

Does it make sense, or is it "too clever", to attempt inner optimization with optim() as an automatic fallback option if the newton evaluation fails?

Definitely "too clever".

Normally the best way forward is to get some understanding of what's causing the problem.
FWIW, in your test example the large number of beta parameters suggests stabilizing by passing profile="beta" to MakeADFun.

OK.

  • what is the best way to implement an informative error message at this point? Clearly returning NaN is OK if ff() is being called in some context that can handle that as a return value. We can't wait for optimHess to throw the error. Are we essentially always in trouble if the newton evaluation fails when order=1, i.e. would this work ...
if (inherits(opt, "try-error")) || !is.finite(opt)) {
    if (order==0) return(NaN)
    if (order==1) stop("inner newton optimization failed during gradient calculation")
}

? Or is it only a problem if the inner optimization fails when calling optimHess ? (Should there be a try() statement around optimHess ... ??)

I agree that stabilizing the fit is a better idea. Can you point me to a discussion of profiling? This sounds like what nAGQ=0 does in glmer ...

kaskr commented

would this work

Yes I think so. I'll make the change myself at some point. Too much going on in another branch soon to merge.

There's a small section on profiling here: http://kaskr.github.io/adcomp/_book/Appendix.html#profiling-the-inner-problem

Thanks! The profiling section is very useful (I think this indeed explains what's going on with nAGQ=0, better than I had understood it previously (i.e. glmer allows profiling of fixed effects even in the case where these are assumptions are violated, with the idea that it can still be a useful approximation - by default, in fact, it does a two-stage optimization with profiled fixed effects at the first stage and non-profiled at the second stage).

I'll leave this open for you to close when you implement the change above.

Returning to this ... above @kaskr says that the best solution is

to get some understanding of what's causing the problem

Other than all the generic strategies that I know for (look for badly scaled variables, extreme values suggestive of values that are converging to the boundaries of the feasible space, computations that should be stabilized by doing them in log-space, ...), are there suggested ways to try understand/diagnose what's going on?

kaskr commented

@bbolker I was mainly thinking in terms of theoretical understanding using pen and paper. What kind of optimization problem is it? (convex, partially convex, bi-quadratic, etc). For example, one nasty property to look out for: Is it possible to choose outer parameters such that the inner problem becomes singular? If yes, the outer problem will be rewarded by moving towards these problematic parameters causing infinitely bad Laplace accuracy - see here.
If the problem is structural it's often possible to boil it down to a toy example with just a handful of parameters. To get there, I often use the parameter where it went wrong badpar <- obj$env$last.par and study the gradient obj$env$f(badpar,order=1) and sparse hessian obj$env$spHess(badpar,random=TRUE) eigen values.
Another useful trick for robustness and NaN debugging is to tabulate and print the AD tapes (ADFun and ADGrad):

TMB:::op_table(obj$env$ADFun)
TMB:::tape_print(obj$env$ADFun)

thanks!