Why is `eff_size` output different than a paired samples cohen's d?
Closed this issue · 6 comments
Goal
Compute a Cohen's d effect size from a paired sample with some missing data (i.e., some people only have an observation at time 1 or time 2). I'd like the effect size to be based on a mixed model which does not remove people with incomplete data (the way that typical cohen's d functions in R do).
Issue
In example datasets with complete data, eff_size
is returning significantly different effect sizes than other cohen's d functions in R. Maybe my implementation is wrong, or eff_size
is returning something that's not a standard paired samples Cohen's d?
Reprex
library(tidyverse)
# Create dataset ---------------------------------------------------------------
set.seed(311)
n_people <- 30
outcome_t1 <- rnorm(n_people)
dat <- tibble(
id = rep(1:n_people, times = 2) %>% factor(),
time = rep(1:2, each = n_people) %>% factor(),
outcome = c(outcome_t1, outcome_t1 + rnorm(outcome_t1, mean = 1, sd = .5))
)
# Calculate Cohen's d ----------------------------------------------------------
## Using paired sample method (within = FALSE) ----
## (Both methods yield the same result)
effectsize::cohens_d(x = dat$outcome[1:n_people],
y = dat$outcome[(n_people+1):(2*n_people)],
paired = TRUE)$Cohens_d
#> [1] -2.533599
effsize::cohen.d(formula = outcome ~ time | Subject(id),
data = dat,
paired = TRUE,
within = FALSE)$estimate
#> [1] -2.533599
## Using paired sample method (within = TRUE) ----
effsize::cohen.d(formula = outcome ~ time | Subject(id),
data = dat,
paired = TRUE,
within = TRUE)$estimate
#> [1] -0.7336026
## Based on mixed model followed with emmeans::eff_size ----
mod_mixed <- lme4::lmer(outcome ~ time + (1|id), data = dat)
em_obj <- emmeans::emmeans(mod_mixed, "time")
degrees_of_freedom <- summary(pairs(em_obj))$df
summary(emmeans::eff_size(em_obj, sigma = sigma(mod_mixed),
edf = degrees_of_freedom))$effect.size
#> [1] -3.58305
I am traveling so cannot look into this right away. But Inote you got two different results from effsize. So I wonder which result you were expecting to get from emmeans -- and why.
I think I'm more interested in the second one (i.e., within = TRUE
), but I'm not totally sure since I would've assumed that paired = TRUE
already implies that I'm interested in within-person changes. Any guidance you can provide on that choice is much appreciated, but I just included both to make sure that the deviation from eff_size
was not due to misspecifying that option.
OK, but don't get your hopes up that I can even answer the question. There are many situations where I am not sure that Cohen's d is even unambiguously defined. What if we have heterogenous variances? What if we have more than one source of variation (as I think is true in what we're discussing)?
And then there is the vast realm of knowledge that I simply lack. I don't work with effect sizes, and am easily confused by what is done in other R packages or in articles in various application fields where terminology shifts or is unfamiliar to me...
OK, here's a clue:
> eff_size(em_obj, sigma = sqrt(2)*sigma(mod_mixed), edf = degrees_of_freedom)
contrast effect.size SE df lower.CL upper.CL
time1 - time2 -2.53 0.379 30.8 -3.31 -1.76
sigma used for effect sizes: 0.4124
Degrees-of-freedom method: inherited from kenward-roger when re-gridding
Confidence level used: 0.95
This is the same estimate as
> effsize::cohen.d(formula = outcome ~ time | Subject(id),
+ data = dat,
+ paired = TRUE,
+ within = FALSE)
Cohen's d
d estimate: -2.533599 (large)
95 percent confidence interval:
lower upper
-3.283425 -1.783773
although the CI from emmeans is slightly wider.
The issue here is that with paired data, Cohen's sigma(mod_mixed)
estimates the error SD, which is the SD of the individual errors after adjusting for the factor effects. The SD of the differences is sqrt(2) times that error SD.
I don't understand what effsize::cohen.d
is doing with its within
argument. It seems to be doing the opposite of what seems right to me. We have:
> lme4::VarCorr(mod_mixed)
Groups Name Std.Dev.
id (Intercept) 1.14468
Residual 0.29162
and these SDs are between-subjects and within-subjects, respectively. So if you say within = TRUE
, that suggests we should use .29162 as the error SD -- which is what it actually did when we said within = FALSE
. Moreover, when using paired differences, the between-subjects variance cancels out and so it makes no sense to use the between-subjects SD in any role. I suspect that you should never use paired = TRUE
and within = TRUE
together in effsize::cohen.d
. You might want to ask the effsize developer about that.
As I've mentioned before, I am not sold on any kind of Cohen-d effect size as being appropriate in the context of a mixed model.
Thanks for looking into this! Great to be able to replicate one of the outputs. The issues board doesn’t seem very active on the effsize repo, but will report back if I figure anything out about the within argument
I think this is resolved so am closing this issue. BTW, I added a note about paired data to the documentation for eff_size