JenniNiku/gllvm

Does gllvm deal with correlated covariates internally?

Closed this issue · 0 comments

Hi, this is not really an issue but more about understanding the internals of gllvm. Does the package do something like a QR decomposition by default to make optimisation more efficient? I couldn't find it easily in the codebase but it seems to be quite robust in the face of strong multicollinearity.

For example, the following fit onto simulated data without multicollinearity gave pretty good estimations as a benchmark.

library(gllvm)
library(mvtnorm)

set.seed(1)

# simulate some seriously correlated predictors
m <- 10  # number of covariates
n <- 100 # number of sites
p <- 10  # number of species

S <- matrix(rep(0, m * m), m, m)
diag(S) <- 1

x <- rmvnorm(n, rep(0, m), S)

beta <- matrix(rnorm(p * (m + 1)), m + 1, p)

eta <- cbind(1, x) %*% beta
rate <- exp(eta)

y <- rpois(length(rate), as.vector(rate))               
y <- matrix(y, n, p)


# model
mod <- gllvm(y, 
             as.data.frame(x), 
             family = poisson(),
             row.eff = FALSE, 
             scale.X = FALSE,
             num.lv = 0)

# compare fitted and true parameters
point_est <- coef(mod)
ci_est    <- confint(mod)

plot(beta, t(cbind(point_est$Intercept, point_est$Xcoef)),
     ylab = "Estimated coef",
     xlab = "True coef")
arrows(t(beta), ci_est[, 1],
       t(beta), ci_est[, 2], 
       col = "grey", length = 0)
abline(0, 1)

image

When I simulated predictors that are very strongly correlated, it doesn't look as bad as I would've thought (by changing to S <- matrix(rep(0.9, m * m), m, m)):
image

Admittedly the estimation starts to go off the true values (and display variance inflation), but still... not bad. I'm writing up a short supplementary section for a project to say something about when these sorts of model break with increasing multicollinearity, so am curious if gllvm does something clever internally. Thanks!