Degrees of freedom with Gaussian model fit via geepack and glmtoolbox
Generalized opened this issue · 5 comments
Hello,
I noticed a discrepancy between how geepack and glmtoolbox GEE engines are supported by emmeans regarding the degrees of freedom.
glmtoolbox::glmgee
When I look into the emm_basis.glmgee
I can notice does a trick - assigns the lm
and glm
classes to the result. This enables the emm_basis.lm
(or glm). Now, if the class is lm (or glm but conditional distribution is Gaussian (or gamma) like in my case), the df.residuals
is returned, otherwise - Inf
.
> emmeans:::emm_basis.glmgee
function (object, trms, xlev, grid, vcov.method = "robust", ...)
{
....
class(object) = c("glm", "lm")
-----------------
> emmeans:::emm_basis.lm
function (object, trms, xlev, grid, ...)
{
......
if (inherits(object, "glm")) {
misc = .std.link.labels(object$family, misc)
dffun = function(k, dfargs) dfargs$df
dfargs = list(df = ifelse(object$family$family %in% c("gaussian", "Gamma"), object$df.residual, Inf)) # <-----
geepack:geeglm
This returns Inf - and this is hardcoded.
I think it would be good to unify the way GEE is handled.
It's even simpler, because the geeglm already offers the df.residual
component.
And the geeglm function assigns also the glm and lm classes: class(out) <- c("geeglm", "gee", "glm", "lm")
Either way (both return Inf or both return residual df) will be fine, but I think they should be consistent...
I'm sorry, I just updated my question, because the original part was irrelevant and I figured it out.
But the second part of my question holds - would it be possible to unify them both to return either Inf or residual.df? geepack (for which Inf is returned) has built in residual.df function, so this would save some work.
This requires no reproducible example. Just information that geepack uses Inf because it's hardcoded in your procedure, and glmtoolbox uses residual.df() from the lm(). So my question is if it's possible that either both returned the same for consistency or both returned residual.df() (geepack offers it natively).
No, I need an example so I can see what it does.
d <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L,
9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L,
13L, 13L, 14L, 14L, 14L), levels = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14"), class = "factor"),
Arm = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), levels = c("A", "B"), class = "factor"), Visit = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1",
"V2", "V3"), class = c("ordered", "factor")), Visit_non_ord = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1",
"V2", "V3"), class = "factor"), Painscore = c(1L, 2L, 4L,
5L, 6L, NA, 4L, 2L, 3L, 4L, 3L, 4L, 5L, NA, 5L, 6L, 7L, 6L,
4L, 3L, 5L, 1L, NA, 4L, 5L, 4L, 5L, 4L, 6L, NA, 0L, 4L, 3L,
4L, NA, 4L, 3L, 4L, NA, 6L, 7L, 8L)), row.names = c(NA, -42L
), class = "data.frame")
geepack::geeglm.
df=Inf because it's hardcoded emmeans:::emm_basis.geeglm
--> dffun = function(k, dfargs) Inf
> library(geepack)
> emmeans( m1 <- geeglm(Painscore ~ Visit * Arm,
family = gaussian(link = "identity"),
id = ID,
corstr = "unstructured",
data=d), specs = ~Visit*Arm)
Visit Arm emmean SE df asymp.LCL asymp.UCL
V1 A 4.11 0.562 Inf 3.01 5.22
V2 A 3.88 0.748 Inf 2.42 5.35
V3 A 4.80 0.416 Inf 3.99 5.62
V1 B 3.22 0.735 Inf 1.78 4.66
V2 B 4.69 0.440 Inf 3.83 5.55
V3 B 4.90 0.575 Inf 3.77 6.02
Covariance estimate used: vbeta
Confidence level used: 0.95
> m$df.residual # geepack natively provides df.residual() out of the box.
[1] 30
glmtoolbox::glmgee
df=30 because it uses lm df.residual()
> library(glmtoolbox)
> emmeans( m2 <- glmgee(Painscore ~ Visit * Arm,
family = gaussian(link = "identity"),
id = ID,
corstr = "unstructured",
data=d), specs = ~Visit*Arm)
Visit Arm emmean SE df lower.CL upper.CL
V1 A 4.09 0.570 30 2.93 5.26
V2 A 3.86 0.744 30 2.34 5.38
V3 A 4.82 0.437 30 3.93 5.71
V1 B 3.25 0.742 30 1.73 4.77
V2 B 4.61 0.433 30 3.73 5.50
V3 B 4.97 0.650 30 3.64 6.29
Covariance estimate used: robust
Confidence level used: 0.95
Warning message:
Estimate of correlation matrix is not positive definite
> df.residual(m2)
[30]
Could handling both be unified in that they return same df?
Either both Inf or both df.residual()
It's not needed now, it can wait when you return in July. Just leaving it to not forget about it.
OK, I made both use df.residual()
, which I hope always works.
> ref_grid(m1) |> summary()
Visit Arm prediction SE df
V1 A 4.11 0.562 30
V2 A 3.88 0.748 30
V3 A 4.80 0.416 30
V1 B 3.22 0.735 30
V2 B 4.69 0.440 30
V3 B 4.90 0.575 30
Covariance estimate used: vbeta
> ref_grid(m2) |> summary()
Visit Arm prediction SE df
V1 A 4.09 0.570 30
V2 A 3.86 0.744 30
V3 A 4.82 0.437 30
V1 B 3.25 0.742 30
V2 B 4.61 0.433 30
V3 B 4.97 0.650 30
Covariance estimate used: robust