rvlenth/emmeans

Comparing each group slope to the slope of all other observations w/ del.eff

walrossker opened this issue · 9 comments

This is a follow up to #469 which compared each factor level's mean to the average of the observations in all other levels. I'd like to replicate that same functionality when comparing slopes from a regression with a factor*continuous interaction. The reprex below compares some emmeans output with a comparable regression model but results still do not match entirely. Is there a problem with my implementation?

library(tidyverse)
library(emmeans)

# Create dataset
n_groups <- 5
n_per_group <- seq(100, 500, 100)
set.seed(3)
dat <- tibble(
  group = map2(1:n_groups, n_per_group, ~ rep(.x, each = .y)) %>% unlist(),
  x = rnorm(sum(n_per_group)),
  y = x*.2 + rnorm(n_groups*n_per_group, mean = group / 10)
) %>% 
  mutate(across(group, factor))

# Visualize independent regression lines in each group
dat %>% 
  ggplot(aes(x, y, color = group))+
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE)+
  theme_minimal()

# Run a regression which compares group 1 to everyone else
mod_group_1_vs_everyone_else <- dat %>% 
  mutate(group_1 = group == 1) %>% 
  lm(y ~ x * group_1, data = .)

# The group 1 has a slope that is .217 lower than the slope for other groups
mod_group_1_vs_everyone_else %>% 
  broom::tidy() %>% 
  filter(str_detect(term, "\\:"))
#> # A tibble: 1 × 5
#>   term          estimate std.error statistic p.value
#>   <chr>            <dbl>     <dbl>     <dbl>   <dbl>
#> 1 x:group_1TRUE   -0.217    0.0858     -2.52  0.0117

# group1 effect is similar but slightly different from weighted emtrends del.eff
# i.e., -.2155 vs. the -.217 above
mod <- lm(y ~ x * group, data = dat)
emm <- emmeans(mod, ~ x*group)
emtrends(mod, 
         del.eff ~ group, 
         var = "x", 
         wts = emm@grid$.wgt., 
         adjust = "none")
#> $emtrends
#>  group x.trend     SE   df lower.CL upper.CL
#>  1      0.0131 0.0837 1490   -0.151    0.177
#>  2      0.2538 0.0478 1490    0.160    0.348
#>  3      0.2559 0.0403 1490    0.177    0.335
#>  4      0.1816 0.0363 1490    0.110    0.253
#>  5      0.2398 0.0321 1490    0.177    0.303
#> 
#> Confidence level used: 0.95 
#> 
#> $contrasts
#>  contrast      estimate     SE   df t.ratio p.value
#>  group1 effect  -0.2155 0.0859 1490  -2.510  0.0122
#>  group2 effect   0.0456 0.0519 1490   0.879  0.3796
#>  group3 effect   0.0520 0.0454 1490   1.145  0.2522
#>  group4 effect  -0.0445 0.0422 1490  -1.054  0.2919
#>  group5 effect   0.0384 0.0394 1490   0.974  0.3303

Why should those results be the same? You are comparing two different models. You would have to demonstrate that the regression coefficient for mod_group_1_vs_everyone_else is equal to your weighted contrast of the group slopes for the model mod -- and that is not obvious, at least to me. Note that mod estimates group means and slopes for each group.

Meanwhile, here's confirmation of the result that emmeans provides, noting that the respective weights for the groups are 100, 200, ..., 500:

> slopediffs = coef(mod)[7:10]    # coefs of x:group2, ..., x:group5
> sum(2:5 * slopediffs) / sum(2:5)
[1] 0.215498

# compared with #
> coef(mod_group_1_vs_everyone_else)
  (Intercept)             x   group_1TRUE x:group_1TRUE 
   0.29630965    0.22964298    0.00181908   -0.21651364 

# and the models don't have the same residuals:
> plot(resid(mod_group_1_vs_everyone_else) ~ resid(mod))

image

Also, I don't see any particular advantage to putting all that "tidy" code (which actually makes things less tidy to my eyes) around a simple result.

I can definitely see why the two models would yield different residuals, but I'm unclear on why they would yield different slopes/slope differences. In #469, the weighted average of other group means is the same as taking the simple average of all observations in those other groups. Here's an extension of the example in #469 that shows this (rounding is different, but I think the two methods return exactly the same value for the cyl4 effect):

library(emmeans)

# Create data
dat <- mtcars
dat$cyl <- factor(dat$cyl)

# Run contrast using weighted avg.
mod <- lm(mpg ~ cyl, data = dat)
emm = emmeans(mod, ~ cyl)
w = emm@grid$.wgt.
contrast(emm, "del.eff", wts = w)
#>  contrast    estimate   SE df t.ratio p.value
#>  cyl4 effect   10.016 1.20 29   8.349  <.0001
#>  cyl6 effect   -0.445 1.38 29  -0.323  0.7490
#>  cyl8 effect   -8.872 1.15 29  -7.725  <.0001
#> 
#> P value adjustment: fdr method for 3 tests

# Run simplified model just comparing cyl == 4 to all other levels
dat$cyl_4 <- dat$cyl == 4
coef(lm(mpg ~ cyl_4, data = dat))
#> (Intercept)   cyl_4TRUE 
#>    16.64762    10.01602

I assumed that this same logic could carry over when comparing group slopes instead of group means. If the del.eff method compares each slope to the slope for "everyone else", isn't that what the interaction term from the simplified mod_group_1_vs_everyone_else in my first reprex represents?

Well, an example is not proof, but a counterexample does constitute disproof. And your original example is a counterexample to what you assumed to be true.

And just to be clear, it is the estimate from mod that correctly estimates those weighted deleted effects. If you go thru the original data and carefully compute the slope of each line, then compute that contrasts, you will get the same numerical result. It is that mod_group_1_vs_everyone_else that does it incorrectly.

I don't think the identical values in my second example are a fluke. Rather my intuition about group means from #469 just does not translate to group slopes with an interaction term. I'm trying to develop a deeper understanding of why they're different, but that might be beyond this discussion.

Your example here has models with both slopes and intercepts -- two of each in mod_group_1_vs_everyone_else and five of each in mod. Moreover, the slopes and intercepts are correlated. Your example in #469 has models with just intercepts; that's much simpler.

Ok, for what it's worth, I tried removing any influence of multiple intercepts by running models with only one intercept each, and the discrepancy between methods persists:

library(emmeans)

# Create data
dat <- mtcars
dat$gear <- factor(dat$gear)

# Estimate gear3 effect using regression with single intercept
mod <- lm(mpg ~ disp + disp:gear, data = dat)
emm <- emmeans(mod, "disp", by = "gear")
emtrends(mod, 
         del.eff ~ gear, 
         var = "disp", 
         wts = emm@grid$.wgt., 
         adjust = "none")
#> $emtrends
#>  gear disp.trend      SE df lower.CL upper.CL
#>  3       -0.0462 0.00573 28  -0.0579  -0.0344
#>  4       -0.0637 0.01539 28  -0.0952  -0.0321
#>  5       -0.0509 0.00956 28  -0.0705  -0.0314
#> 
#> Confidence level used: 0.95 
#> 
#> $contrasts
#>  contrast     estimate      SE df t.ratio p.value
#>  gear3 effect  0.01373 0.00883 28   1.554  0.1314
#>  gear4 effect -0.01628 0.01100 28  -1.480  0.1500
#>  gear5 effect  0.00301 0.00746 28   0.403  0.6898

# Estimate gear3 effect using regression with single intercept
# but gear3 vs. the others
dat$gear3 <- dat$gear == 3
mod_gear_3_vs_others <- lm(mpg ~ disp + disp:gear3, data = dat)
coef(mod_gear_3_vs_others)
#>    (Intercept)           disp disp:gear3TRUE 
#>   30.593958685   -0.050414895    0.007377983

So even when there is only one intercept, the intuition that "the weighted average of group averages equals the overall average" still doesn't transfer to slopes.

You're headed in the wrong direction. Note your initial try was off, but not very far off -- better than 2 digits accuracy. Leaving terms out of a model constitutes not accounting for the associated effects; but those effects are still there in the data -- they don't vanish just because youdidn't include them in the model. Your example with just means has a simplified model that accounts for all of the effect of group 1 versus the others, while ignoring other effects associated with other groups. Your initial model here accounts for almost all of the desired contrast of slopes, but misses a little bit of it. By further simplifying that model, you account for less of it, not more.

I think we've covered most of the ground here, so am closing this issue.