simongrund1/mitml

D4 test for geneal linear models fitted with `nlme::gls()`

Closed this issue · 2 comments

Since the intercept-only mixed model can be represented by a general linear model with a compound symmetry covariance structure, the pooled LRT results from both models should be the same. However, when I use testModels function for pooling the LRTs from general linear models, it does not match with the pooling results from corresponding mixed models. Below is the code, where the mixed model is fitted using lme4::lmer function and the general linear model is fitted using nlme:gls function.

library(mitml)
library(nlme)
library(lme4)
data(studentratings)

fml <- ReadDis + SES ~ ReadAchiev + (1|ID)
imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5)

implist <- mitmlComplete(imp)

fit1 <- with(implist, lmer(ReadAchiev ~ ReadDis + (1|ID), REML = FALSE))
fit0 <- with(implist, lmer(ReadAchiev ~ (1|ID), REML = FALSE))

fit1_gls <- with(implist, gls(model = ReadAchiev ~ ReadDis, method = "ML",
                          correlation = corCompSymm(form = ~ 1 | ID)))
fit0_gls <- with(implist, gls(model = ReadAchiev ~ 1, method = "ML",
                          correlation = corCompSymm(form = ~ 1 | ID)))

testModels(fit1, fit0, method = "D4", ariv = "positive")
testModels(fit1_gls, fit0_gls, method = "D4", ariv = "positive", data = implist)
# **results do not match**

Note that: the individual completed-data results are the same, as they should be. You can check, for each completed dataset:

# For the 1st completed datasets:
summary(fit1[[1]])
summary(fit1_gls[[1]])
anova(fit1[[1]], fit0[[1]])
anova(fit1_gls[[1]], fit0_gls[[1]])

In addition:

I downloaded the source code for testModels with all its dependencies. Then I rename it to testModels2 and pool the same models.

source("testModels2-with-its-dependencies.R")  # "testModels2-with-its-dependencies.txt" is attached
testModels2(fit1, fit0, method = "D4", ariv = "positive")
testModels2(fit1_gls, fit0_gls, method = "D4", ariv = "positive", data = implist)

Surprisingly, they resulted in a similar output. But it does not seem to be a good result, since the pooled result does not go with the m individual completed data results. Could someone please explain why is this happening?

testModels2-with-its-dependencies.txt
.

I think the issue here is that testModels had no support for gls() with D4. As a result, testModels used the default methods for evaluating the likelihood in the stacked data, which ignores the special role that the cluster variables play in these types of models.

I pushed some changes that should fix this by adding initial support for nlme::gls(). This is what I get now with your code:

library(mitml)
library(nlme)
library(lme4)

set.seed(4352)

data(studentratings)

fml <- ReadDis + SES ~ ReadAchiev + (1|ID)
imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5)

implist <- mitmlComplete(imp)

fit1 <- with(implist, lmer(ReadAchiev ~ ReadDis + (1|ID), REML = FALSE))
fit0 <- with(implist, lmer(ReadAchiev ~ (1|ID), REML = FALSE))

fit1_gls <- with(implist, gls(model = ReadAchiev ~ ReadDis, method = "ML", correlation = corCompSymm(form = ~ 1 | ID)))
fit0_gls <- with(implist, gls(model = ReadAchiev ~ 1, method = "ML", correlation = corCompSymm(form = ~ 1 | ID)))

testModels(fit1, fit0, method = "D4", ariv = "positive")
# 
# Call:
# 
# testModels(model = fit1, null.model = fit0, method = "D4", ariv = "positive")
# 
# Model comparison calculated from 5 imputed data sets.
# Combination method: D4
# 
#     F.value     df1     df2   P(>F)     RIV 
#      51.648       1 343.256   0.000   0.121 
# 
# Data for stacking were automatically extracted from the fitted models.

testModels(fit1_gls, fit0_gls, method = "D4", ariv = "positive", data = implist)
# 
# Call:
# 
# testModels(model = fit1_gls, null.model = fit0_gls, method = "D4", 
#     ariv = "positive", data = implist)
# 
# Model comparison calculated from 5 imputed data sets.
# Combination method: D4
# 
#     F.value     df1     df2   P(>F)     RIV 
#      51.648       1 343.256   0.000   0.121 
# 
# Data for stacking were extracted from the `data` argument.

Thanks. The issue is solved. However, one thing to note is that for gls models, users MUST always provide the list of imputed data sets in the data argument of the testModels(method="D4", ...) function. Because the function nlme::getData() returns the data matrix only if the data that has been used to fit the model is saved in the Environment.