JenniNiku/gllvm

residuals.gllvm() fails when a == b in runif()

Closed this issue · 3 comments

hrlai commented

Hi, in one of my models, I got the following error message when running residuals(fit) and the associated functions (e.g., plot):

Error in if (u == 1) u = 1 - 1e-16 : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In runif(n = 1, min = a, max = b) : NAs produced

Looking into the source code, it seems that it is due to the generation of Dunn-Symth residuals---this chunk in residuals.gllvm:

        if (object$family == "ZIP") {
            a <- pzip(as.vector(unlist(y[i, j])) - 1, mu = mu[i, j], sigma = object$params$phi[j])
            b <- pzip(as.vector(unlist(y[i, j])), mu = mu[i, j], sigma = object$params$phi[j])
            u <- runif(n = 1, min = a, max = b)
            if(u==1) u=1-1e-16
            if(u==0) u=1e-16
            ds.res[i, j] <- qnorm(u)
        }

In my case, a can be identical to b (both are 1 in my case), so they caused runif(n = 1, min = a, max = b) to create NaN: Warning message: In runif(n = 1, min = a, max = b) : NAs produced and then the next line to fail to assess the if condition.

Thanks for posting your issue!

In my experience, this generally occurs in particularly poor fitting models. As such, I don't consider this a bug, but more a point of improvement (i.e. a more helpful error message should be given). Do you have any other indicators that the model poorly fits to your data? E.g., is there chance of overfitting due to a high degree of complexity (large number of parameters), collinearity between predictors, lack of convergence, non-singular Hessian, et cetera?

The first thing you might try is to re-fit to model from different starting values, to see if getting to a different (hopefully better) solution solves your issue.

hrlai commented

Ah, thanks so much for the tips @BertvanderVeen . It is a microbial dataset so it's inherently high in number of parameters...

The model convergence returned TRUE, but as I followed your suggestions and examine the Hessian, I got at least one singular value:

> solve(mod$Hess$Hess.full)
Error in solve.default(mod$Hess$Hess.full) : 
  Lapack routine dgesv: system is exactly singular: U[763,763] = 0

The coefs reported in the summary table all have reasonable SEs, so i suspect that this problematic coef is one of the LVs. But I don't know how to examine the estimation errors of the LVs (the helpfile of gllvm v1.3.1 says that gllvm returns an A object, but I couldn't find it in my model object?)

But you're right, this should not be considered a bug. Feel free to close this issue!

No problem at all! Feedback like yours is very helpful in making the R-package more userfriendly.

Please note that mod$Hess$Hess.full also includes entries for parameter estimates that are fixed to zero, since they are not applicable to your model, such as the cut-offs for the ordinal (so that it never is invertible). Hence, the correct Hessian is mod$Hess$Hess.full[mod$Hess$incl,mod$Hess$incl].

I still intend to write a vignette on convergence checking in gllvm, but I haven't gotten around to do that yet. Regardless, my suggestion would be to check if that matrix has any negative eigenvalues, i.e.: any(eigen(mod$Hess$Hess.full[mod$Hess$incl,mod$Hess$incl])$val<0), indicating that the matrix is not positive semi-definite.

Also, feel free to check the magnitude of the gradient (also note that there is an automatic option to do this in the control argument of gllvm), by hist(mod$TMBfn$gr(mod$TMBfn$par)).

Try to re-run your model from different starting values (see documentation for gllvm, but e.g., starting.val = "zero", or starting.val = "res" in combination with n.init = 3 and jitter.var=0.4, or with a different optimizer; currently optim and nlminb are supported).

Convergence checking in mixed-effects models in difficult in general, but this should be a start.