residuals.gllvm() fails when a == b in runif()

Question

residuals.gllvm() fails when a == b in runif()

Closed this issue 2 years ago · 3 comments

Hi, in one of my models, I got the following error message when running residuals(fit) and the associated functions (e.g., plot):

Error in if (u == 1) u = 1 - 1e-16 : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In runif(n = 1, min = a, max = b) : NAs produced

Looking into the source code, it seems that it is due to the generation of Dunn-Symth residuals---this chunk in residuals.gllvm:

        if (object$family == "ZIP") {
            a <- pzip(as.vector(unlist(y[i, j])) - 1, mu = mu[i, j], sigma = object$params$phi[j])
            b <- pzip(as.vector(unlist(y[i, j])), mu = mu[i, j], sigma = object$params$phi[j])
            u <- runif(n = 1, min = a, max = b)
            if(u==1) u=1-1e-16
            if(u==0) u=1e-16
            ds.res[i, j] <- qnorm(u)
        }

In my case, a can be identical to b (both are 1 in my case), so they caused runif(n = 1, min = a, max = b) to create NaN: Warning message: In runif(n = 1, min = a, max = b) : NAs produced and then the next line to fail to assess the if condition.

Answer 1 · 2021-08-11T06:19:47.000Z

Thanks for posting your issue!

In my experience, this generally occurs in particularly poor fitting models. As such, I don't consider this a bug, but more a point of improvement (i.e. a more helpful error message should be given). Do you have any other indicators that the model poorly fits to your data? E.g., is there chance of overfitting due to a high degree of complexity (large number of parameters), collinearity between predictors, lack of convergence, non-singular Hessian, et cetera?

The first thing you might try is to re-fit to model from different starting values, to see if getting to a different (hopefully better) solution solves your issue.

Answer 2 · 2021-08-11T21:36:31.000Z

Ah, thanks so much for the tips @BertvanderVeen . It is a microbial dataset so it's inherently high in number of parameters...

The model convergence returned TRUE, but as I followed your suggestions and examine the Hessian, I got at least one singular value:

> solve(mod$Hess$Hess.full)
Error in solve.default(mod$Hess$Hess.full) : 
  Lapack routine dgesv: system is exactly singular: U[763,763] = 0

The coefs reported in the summary table all have reasonable SEs, so i suspect that this problematic coef is one of the LVs. But I don't know how to examine the estimation errors of the LVs (the helpfile of gllvm v1.3.1 says that gllvm returns an A object, but I couldn't find it in my model object?)

But you're right, this should not be considered a bug. Feel free to close this issue!

Answer 3 · 2021-08-12T06:30:52.000Z

No problem at all! Feedback like yours is very helpful in making the R-package more userfriendly.

Please note that mod$Hess$Hess.full also includes entries for parameter estimates that are fixed to zero, since they are not applicable to your model, such as the cut-offs for the ordinal (so that it never is invertible). Hence, the correct Hessian is mod$Hess$Hess.full[mod$Hess$incl,mod$Hess$incl].

I still intend to write a vignette on convergence checking in gllvm, but I haven't gotten around to do that yet. Regardless, my suggestion would be to check if that matrix has any negative eigenvalues, i.e.: any(eigen(mod$Hess$Hess.full[mod$Hess$incl,mod$Hess$incl])$val<0), indicating that the matrix is not positive semi-definite.

Also, feel free to check the magnitude of the gradient (also note that there is an automatic option to do this in the control argument of gllvm), by hist(mod$TMBfn$gr(mod$TMBfn$par)).

Try to re-run your model from different starting values (see documentation for gllvm, but e.g., starting.val = "zero", or starting.val = "res" in combination with n.init = 3 and jitter.var=0.4, or with a different optimizer; currently optim and nlminb are supported).

Convergence checking in mixed-effects models in difficult in general, but this should be a start.