easystats/parameters

`model_parameters.predictions()` no longer working

strengejacke opened this issue · 11 comments

tagging @vincentarelbundock
I think column names changed ("predicted" no longer available), and maybe also some attributes?

Very sorry about that. 0.9.0 was a big "let's break everything" release. Things should be more stable from now on. I opened a PR: #844

Thanks! I didn't expect immediate reaction, was just pinging in case you have an idea how to address this - some time in the future ;-)

Cool cool.

Overall, I have to say I'm still not exactly sure what to think of those methods. The main change in the latest version of the package is that I make a very clear distinction between:

predictions() & avg_predictions()
comparisons() & avg_comparisons()
slopes() & avg_slopes()

There was so much confusion among users about the two step process of marginaleffects(model)|>summary() that it felt important to distinguish at the function level.

But now the parameters() function only does avg_*() automatically. Maybe that's what we want, but I'm not sure...

Thoughts?

What do the summary() and tidy() methods do?

Both tidy() and summary() average unit-level (conditional) estimates. So avg_comparisons(model) is now the same as summary(comparisons(model))

The new recommended workflow is to use avg_*() because it is more explicit and everyone kept asking the same question as you: what does summary() do?

Maybe we should just keep this as is so it is consistent with tidy()? My guess is most people will want average estimates, no?

I'm still struggling a bit with marginaleffects approach, but maybe that's just me and my personal "training" in other packages. I'd say, average estimates is not what we usually want. See following examples that maybe make the reasons for my confusion clearer:

library(ggeffects)
library(emmeans)
library(marginaleffects)

data(iris)
set.seed(123)
iris$dummy <- as.factor(rbinom(150, 1, prob = c(.3, .7)))
iris$categorical <- as.factor(sample(letters[1:3], 150, TRUE))
m <- lm(Sepal.Length ~ Species * dummy, data = iris)

# nice, predictions all identical!
ggpredict(m, c("Species", "dummy"))
#> # Predicted values of Sepal.Length
#> 
#> # dummy = 0
#> 
#> Species    | Predicted |       95% CI
#> -------------------------------------
#> setosa     |      5.03 | [4.84, 5.21]
#> versicolor |      5.88 | [5.65, 6.11]
#> virginica  |      6.69 | [6.50, 6.88]
#> 
#> # dummy = 1
#> 
#> Species    | Predicted |       95% CI
#> -------------------------------------
#> setosa     |      4.97 | [4.75, 5.20]
#> versicolor |      5.97 | [5.79, 6.15]
#> virginica  |      6.45 | [6.24, 6.67]
emmeans(m, c("Species", "dummy"))
#>  Species    dummy emmean     SE  df lower.CL upper.CL
#>  setosa     0       5.03 0.0939 144     4.84     5.21
#>  versicolor 0       5.88 0.1180 144     5.65     6.11
#>  virginica  0       6.69 0.0972 144     6.50     6.89
#>  setosa     1       4.97 0.1151 144     4.75     5.20
#>  versicolor 1       5.97 0.0924 144     5.79     6.15
#>  virginica  1       6.45 0.1097 144     6.24     6.67
#> 
#> Confidence level used: 0.95
nd <- data_grid(m, c("Species", "dummy"))
predictions(m, newdata = nd)
#> 
#>  Estimate Std. Error     z   Pr(>|z|) 2.5 % 97.5 %
#>     5.027    0.09394 53.51 < 2.22e-16 4.843  5.211
#>     5.879    0.11804 49.80 < 2.22e-16 5.648  6.110
#>     6.693    0.09724 68.83 < 2.22e-16 6.502  6.883
#>     4.975    0.11506 43.24 < 2.22e-16 4.749  5.201
#>     5.971    0.09241 64.61 < 2.22e-16 5.790  6.152
#>     6.455    0.10970 58.84 < 2.22e-16 6.240  6.670
#> 
#> Prediction type:  response 
#> Columns: rowid, type, estimate, std.error, statistic, p.value, conf.low, conf.high, Species, dummy, Sepal.Length


# now we add a categorical covariate...
m <- lm(Sepal.Length ~ Species * dummy + categorical, data = iris)

# predictions identical
ggpredict(m, c("Species", "dummy"))
#> # Predicted values of Sepal.Length
#> 
#> # dummy = 0
#> 
#> Species    | Predicted |       95% CI
#> -------------------------------------
#> setosa     |      5.10 | [4.87, 5.32]
#> versicolor |      5.92 | [5.67, 6.17]
#> virginica  |      6.72 | [6.51, 6.92]
#> 
#> # dummy = 1
#> 
#> Species    | Predicted |       95% CI
#> -------------------------------------
#> setosa     |      5.06 | [4.80, 5.32]
#> versicolor |      6.02 | [5.82, 6.23]
#> virginica  |      6.50 | [6.26, 6.73]
#> 
#> Adjusted for:
#> * categorical = a
nd <- data_grid(m, c("Species", "dummy"))
predictions(m, newdata = nd)
#> 
#>  Estimate Std. Error     z   Pr(>|z|) 2.5 % 97.5 %
#>     5.097     0.1157 44.07 < 2.22e-16 4.870  5.324
#>     5.919     0.1285 46.07 < 2.22e-16 5.667  6.170
#>     6.715     0.1045 64.23 < 2.22e-16 6.510  6.920
#>     5.059     0.1319 38.35 < 2.22e-16 4.800  5.317
#>     6.022     0.1042 57.79 < 2.22e-16 5.818  6.226
#>     6.497     0.1213 53.57 < 2.22e-16 6.259  6.735
#> 
#> Prediction type:  response 
#> Columns: rowid, type, estimate, std.error, statistic, p.value, conf.low, conf.high, Species, dummy, categorical, Sepal.Length

# averaging over levels of covariate ("marginal means")
emmeans(m, c("Species", "dummy"))
#>  Species    dummy emmean     SE  df lower.CL upper.CL
#>  setosa     0       5.04 0.0942 142     4.85     5.22
#>  versicolor 0       5.86 0.1184 142     5.63     6.09
#>  virginica  0       6.66 0.1002 142     6.46     6.85
#>  setosa     1       5.00 0.1161 142     4.77     5.23
#>  versicolor 1       5.96 0.0928 142     5.78     6.15
#>  virginica  1       6.44 0.1100 142     6.22     6.66
#> 
#> Results are averaged over the levels of: categorical 
#> Confidence level used: 0.95

# only one estimate, although "newdata" is a data grid?
avg_predictions(m, newdata = nd)
#> 
#>  Estimate Std. Error    z   Pr(>|z|) 2.5 % 97.5 %
#>     5.885    0.06811 86.4 < 2.22e-16 5.751  6.018
#> 
#> Prediction type:  response 
#> Columns: type, estimate, std.error, statistic, p.value, conf.low, conf.high

# does not work
marginal_means(m, newdata = nd)
#> Warning: These arguments are not supported for models of class `lm`: newdata.
#> Please file a request on Github if you believe that additional arguments should
#> be supported: https://github.com/vincentarelbundock/marginaleffects/issues
#> Error in (function (model, newdata, type, variables, cross, modeldata, : formal argument "newdata" matched by multiple actual arguments

# not identical to emmeans
marginal_means(m, variables = c("Species", "dummy"))
#> 
#>     Term      Value  Mean Std. Error     z   Pr(>|z|) 2.5 % 97.5 %
#>  Species     setosa 5.019    0.07497 66.95 < 2.22e-16 4.872  5.166
#>  Species versicolor 5.911    0.07546 78.33 < 2.22e-16 5.763  6.059
#>  Species  virginica 6.547    0.07519 87.07 < 2.22e-16 6.400  6.694
#>    dummy          0 5.851    0.06044 96.80 < 2.22e-16 5.733  5.970
#>    dummy          1 5.800    0.06144 94.40 < 2.22e-16 5.680  5.921
#> 
#> Prediction type:  response 
#> Results averaged over levels of: Species, dummy, categorical 
#> Columns: type, term, value, estimate, std.error, statistic, p.value, conf.low, conf.high

# finally, same es emmeans
marginal_means(m, cross = TRUE, variables = c("Species", "dummy"))
#> 
#>   Mean Std. Error     z   Pr(>|z|) 2.5 % 97.5 %
#>  5.038    0.09419 53.49 < 2.22e-16 4.853  5.223
#>  5.000    0.11609 43.07 < 2.22e-16 4.772  5.227
#>  5.860    0.11842 49.48 < 2.22e-16 5.628  6.092
#>  5.963    0.09282 64.24 < 2.22e-16 5.781  6.145
#>  6.656    0.10015 66.46 < 2.22e-16 6.460  6.852
#>  6.438    0.10997 58.54 < 2.22e-16 6.222  6.653
#> 
#> Prediction type:  response 
#> Results averaged over levels of: categorical 
#> Columns: type, Species, dummy, estimate, std.error, statistic, p.value, conf.low, conf.high
  1. why can't I used newdata in marginal_means(), too?

  2. what is the difference between newdata and variables argument? variables in predictions() behaves completely different than in marginal_means(), and the equivalent to variables in marginal_means() seems to be newdata in predictions()

library(marginaleffects)

data(iris)
set.seed(123)
iris$dummy <- as.factor(rbinom(150, 1, prob = c(.3, .7)))
iris$categorical <- as.factor(sample(letters[1:3], 150, TRUE))

m <- lm(Sepal.Length ~ Species * dummy, data = iris)
nd <- insight::get_datagrid(m, c("Species", "dummy"))
marginal_means(m, variables = c("Species", "dummy"))
#> 
#>     Term      Value  Mean Std. Error     z   Pr(>|z|) 2.5 % 97.5 %
#>  Species     setosa 5.001    0.07427 67.33 < 2.22e-16 4.855  5.146
#>  Species versicolor 5.925    0.07496 79.04 < 2.22e-16 5.778  6.072
#>  Species  virginica 6.574    0.07330 89.69 < 2.22e-16 6.430  6.717
#>    dummy          0 5.866    0.05983 98.05 < 2.22e-16 5.749  5.983
#>    dummy          1 5.800    0.06129 94.63 < 2.22e-16 5.680  5.920
#> 
#> Prediction type:  response 
#> Results averaged over levels of: Species, dummy 
#> Columns: type, term, value, estimate, std.error, statistic, p.value, conf.low, conf.high

predictions(m, variables = c("Species", "dummy"))
#> 
#>  Estimate Std. Error     z   Pr(>|z|) 2.5 % 97.5 %
#>     5.027    0.09394 53.51 < 2.22e-16 4.843  5.211
#>     5.027    0.09394 53.51 < 2.22e-16 4.843  5.211
#>     5.027    0.09394 53.51 < 2.22e-16 4.843  5.211
#>     5.027    0.09394 53.51 < 2.22e-16 4.843  5.211
#>     5.027    0.09394 53.51 < 2.22e-16 4.843  5.211
#> --- 890 rows omitted. See ?avg_predictions and ?print.marginaleffects --- 
#>     6.455    0.10970 58.84 < 2.22e-16 6.240  6.670
#>     6.455    0.10970 58.84 < 2.22e-16 6.240  6.670
#>     6.455    0.10970 58.84 < 2.22e-16 6.240  6.670
#>     6.455    0.10970 58.84 < 2.22e-16 6.240  6.670
#>     6.455    0.10970 58.84 < 2.22e-16 6.240  6.670 
#> 
#> Prediction type:  response 
#> Columns: rowid, rowidcf, type, estimate, std.error, statistic, p.value, conf.low, conf.high, Sepal.Length, Species, dummy

marginal_means(m, newdata = nd)
#> Warning: These arguments are not supported for models of class `lm`: newdata.
#> Please file a request on Github if you believe that additional arguments should
#> be supported: https://github.com/vincentarelbundock/marginaleffects/issues
#> Error in (function (model, newdata, type, variables, cross, modeldata, : formal argument "newdata" matched by multiple actual arguments

predictions(m, newdata = nd)
#> 
#>  Estimate Std. Error     z   Pr(>|z|) 2.5 % 97.5 %
#>     5.027    0.09394 53.51 < 2.22e-16 4.843  5.211
#>     5.879    0.11804 49.80 < 2.22e-16 5.648  6.110
#>     6.693    0.09724 68.83 < 2.22e-16 6.502  6.883
#>     4.975    0.11506 43.24 < 2.22e-16 4.749  5.201
#>     5.971    0.09241 64.61 < 2.22e-16 5.790  6.152
#>     6.455    0.10970 58.84 < 2.22e-16 6.240  6.670
#> 
#> Prediction type:  response 
#> Columns: rowid, type, estimate, std.error, statistic, p.value, conf.low, conf.high, Species, dummy, Sepal.Length
  1. why do I need cross = TRUE when interactions are involved, although my data grid contains all interaction terms? Calculating marginal means per predictor when interaction is involved usually leads to misleading results, I think.

and

  1. My intuitive guess is that variables is to specify focal predictors, for which a reference grid is automatically calculated. This works for marginal_means(), but not for predictions(). The former creates a data grid based on variables, the latter not.
library(marginaleffects)

data(iris)
set.seed(123)
iris$dummy <- as.factor(rbinom(150, 1, prob = c(.3, .7)))
iris$categorical <- as.factor(sample(letters[1:3], 150, TRUE))

m <- lm(Sepal.Length ~ Species * dummy + categorical, data = iris)

marginal_means(m, variables = c("Species", "dummy"))
#> 
#>     Term      Value  Mean Std. Error     z   Pr(>|z|) 2.5 % 97.5 %
#>  Species     setosa 5.019    0.07497 66.95 < 2.22e-16 4.872  5.166
#>  Species versicolor 5.911    0.07546 78.33 < 2.22e-16 5.763  6.059
#>  Species  virginica 6.547    0.07519 87.07 < 2.22e-16 6.400  6.694
#>    dummy          0 5.851    0.06044 96.80 < 2.22e-16 5.733  5.970
#>    dummy          1 5.800    0.06144 94.40 < 2.22e-16 5.680  5.921
#> 
#> Prediction type:  response 
#> Results averaged over levels of: Species, dummy, categorical 
#> Columns: type, term, value, estimate, std.error, statistic, p.value, conf.low, conf.high
marginal_means(m)
#> 
#>         Term      Value  Mean Std. Error     z   Pr(>|z|) 2.5 % 97.5 %
#>      Species     setosa 5.019    0.07497 66.95 < 2.22e-16 4.872  5.166
#>      Species versicolor 5.911    0.07546 78.33 < 2.22e-16 5.763  6.059
#>      Species  virginica 6.547    0.07519 87.07 < 2.22e-16 6.400  6.694
#>        dummy          0 5.851    0.06044 96.80 < 2.22e-16 5.733  5.970
#>        dummy          1 5.800    0.06144 94.40 < 2.22e-16 5.680  5.921
#>  categorical          a 5.885    0.06811 86.40 < 2.22e-16 5.751  6.018
#>  categorical          b 5.869    0.07583 77.40 < 2.22e-16 5.721  6.018
#>  categorical          c 5.723    0.08086 70.78 < 2.22e-16 5.565  5.881
#> 
#> Prediction type:  response 
#> Results averaged over levels of: Species, dummy, categorical 
#> Columns: type, term, value, estimate, std.error, statistic, p.value, conf.low, conf.high
marginal_means(m, variables = c("Species", "dummy"), cross = TRUE)
#> 
#>   Mean Std. Error     z   Pr(>|z|) 2.5 % 97.5 %
#>  5.038    0.09419 53.49 < 2.22e-16 4.853  5.223
#>  5.000    0.11609 43.07 < 2.22e-16 4.772  5.227
#>  5.860    0.11842 49.48 < 2.22e-16 5.628  6.092
#>  5.963    0.09282 64.24 < 2.22e-16 5.781  6.145
#>  6.656    0.10015 66.46 < 2.22e-16 6.460  6.852
#>  6.438    0.10997 58.54 < 2.22e-16 6.222  6.653
#> 
#> Prediction type:  response 
#> Results averaged over levels of: categorical 
#> Columns: type, Species, dummy, estimate, std.error, statistic, p.value, conf.low, conf.high
marginal_means(m, cross = TRUE)
#> Error: When `cross = TRUE`, you must use the `variables` argument to specify
#>   which variables should be interacted.

I think marginaleffects would be easier/more accessible, if there were some functions that take care of the above mentioned points (addressing "casual" users), and some functions with more options, addressing the needs of more advanced users.

I think the arguments should be:

  • model
  • focal predictors
  • representative values (optional, choose a sensible default like mean/sd for numeric and all levels for categorical)

Another example would be that it took me quite long to figure out how to do simple pairwise comparisons:

library(emmeans)
library(marginaleffects)

data(iris)
set.seed(123)
iris$fac_b <- as.factor(rbinom(150, 1, prob = c(.3, .7)))
iris$fac_a <- as.factor(rbinom(150, 1, prob = c(.6, .4)))
m <- lm(Sepal.Length ~ fac_a * fac_b, data = iris)
nd <- insight::get_datagrid(m, c("fac_a", "fac_b"))

# classical pairwise comparison
emmeans(m, c("fac_a", "fac_b")) |> pairs()
#>  contrast                      estimate    SE  df t.ratio p.value
#>  fac_a0 fac_b0 - fac_a1 fac_b0  -0.3195 0.189 146  -1.689  0.3332
#>  fac_a0 fac_b0 - fac_a0 fac_b1  -0.0953 0.196 146  -0.486  0.9622
#>  fac_a0 fac_b0 - fac_a1 fac_b1  -0.2531 0.195 146  -1.298  0.5655
#>  fac_a1 fac_b0 - fac_a0 fac_b1   0.2242 0.188 146   1.194  0.6315
#>  fac_a1 fac_b0 - fac_a1 fac_b1   0.0665 0.186 146   0.357  0.9844
#>  fac_a0 fac_b1 - fac_a1 fac_b1  -0.1577 0.194 146  -0.815  0.8473
#> 
#> P value adjustment: tukey method for comparing a family of 4 estimates

marginaleffects::comparisons(
  m,
  newdata = nd,
  cross = FALSE,
  variables = c("fac_a", "fac_b")
)
#> 
#>   Term Contrast Estimate Std. Error       z Pr(>|z|)    2.5 % 97.5 %
#>  fac_a    1 - 0  0.31952     0.1892  1.6889 0.091239 -0.05128 0.6903
#>  fac_a    1 - 0  0.31952     0.1892  1.6889 0.091239 -0.05128 0.6903
#>  fac_a    1 - 0  0.15773     0.1935  0.8151 0.415027 -0.22156 0.5370
#>  fac_a    1 - 0  0.15773     0.1935  0.8151 0.415027 -0.22156 0.5370
#>  fac_b    1 - 0  0.09532     0.1962  0.4858 0.627142 -0.28928 0.4799
#>  fac_b    1 - 0 -0.06647     0.1864 -0.3567 0.721350 -0.43177 0.2988
#>  fac_b    1 - 0  0.09532     0.1962  0.4858 0.627142 -0.28928 0.4799
#>  fac_b    1 - 0 -0.06647     0.1864 -0.3567 0.721350 -0.43177 0.2988
#> 
#> Prediction type:  response 
#> Columns: rowid, type, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, fac_a, fac_b, Sepal.Length

marginaleffects::comparisons(
  m,
  newdata = nd,
  cross = TRUE,
  variables = c("fac_a", "fac_b")
)
#> 
#>  C: fac_a C: fac_b Estimate Std. Error     z Pr(>|z|)  2.5 % 97.5 %
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#> 
#> Prediction type:  response 
#> Columns: rowid, type, term, contrast_fac_a, contrast_fac_b, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, fac_a, Sepal.Length, fac_b

marginaleffects::comparisons(
  m,
  cross = TRUE,
  variables = c("fac_a", "fac_b")
)
#> 
#>  C: fac_a C: fac_b Estimate Std. Error     z Pr(>|z|)  2.5 % 97.5 %
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#> --- 140 rows omitted. See ?avg_comparisons and ?print.marginaleffects --- 
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351 
#> 
#> Prediction type:  response 
#> Columns: rowid, type, term, contrast_fac_a, contrast_fac_b, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species, fac_a, fac_b

marginaleffects::comparisons(
  m,
  newdata = nd,
  cross = TRUE,
  variables = c("fac_a", "fac_b"),
  by = TRUE
)
#> 
#>  C: fac_a C: fac_b Estimate Std. Error     z Pr(>|z|)  2.5 % 97.5 %
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#> 
#> Prediction type:  response 
#> Columns: type, term, contrast_fac_a, contrast_fac_b, estimate, std.error, statistic, p.value, conf.low, conf.high

marginaleffects::comparisons(
  m,
  cross = TRUE,
  variables = c("fac_a", "fac_b"),
  by = TRUE
)
#> 
#>  C: fac_a C: fac_b Estimate Std. Error     z Pr(>|z|)  2.5 % 97.5 %
#>     1 - 0    1 - 0   0.2531     0.1949 1.298   0.1942 -0.129 0.6351
#> 
#> Prediction type:  response 
#> Columns: type, term, contrast_fac_a, contrast_fac_b, estimate, std.error, statistic, p.value, conf.low, conf.high

marginaleffects::comparisons(
  m,
  newdata = nd,
  by = TRUE
)
#> 
#>   Term Contrast Estimate Std. Error      z Pr(>|z|)    2.5 % 97.5 %
#>  fac_a    1 - 0  0.23863     0.1353 1.7635  0.07782 -0.02659 0.5038
#>  fac_b    1 - 0  0.01442     0.1353 0.1066  0.91512 -0.25079 0.2796
#> 
#> Prediction type:  response 
#> Columns: type, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high

# finally?
marginaleffects::comparisons(
  m,
  variables = "fac_a",
  by = "fac_b"
)
#> 
#>   Term          Contrast fac_b Estimate Std. Error      z Pr(>|z|)    2.5 %
#>  fac_a mean(1) - mean(0)     0   0.3195     0.1892 1.6889 0.091239 -0.05128
#>  fac_a mean(1) - mean(0)     1   0.1577     0.1935 0.8151 0.415027 -0.22156
#>  97.5 %
#>  0.6903
#>  0.5370
#> 
#> Prediction type:  response 
#> Columns: type, term, contrast, fac_b, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo

# do I need `cross = TRUE`?
marginaleffects::comparisons(
  m,
  cross = TRUE,
  variables = "fac_a",
  by = "fac_b"
)
#> 
#>           C: fac_a fac_b Estimate Std. Error      z Pr(>|z|)    2.5 % 97.5 %
#>  mean(1) - mean(0)     0   0.3195     0.1892 1.6889 0.091239 -0.05128 0.6903
#>  mean(1) - mean(0)     1   0.1577     0.1935 0.8151 0.415027 -0.22156 0.5370
#> 
#> Prediction type:  response 
#> Columns: type, term, contrast_fac_a, fac_b, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo

Created on 2023-02-03 with reprex v2.0.2

The logic behind your API is quite different to that from emmeans, so I guess it's a lot of "getting used to it", but still some aspects do not seem consistent and clear to me, at least at a first glance. Maybe I need to play around more with marginaleffects so that the aha effect occurs ;-)

There are two issues here:

  1. The marginal_means() function has a bad user interface, its arguments are inconsistent with same-name arguments in other marginaleffects functions, and its behavior is sometimes counter-intuitive.
  2. You don’t grok the general “concept” of marginaleffects.

You are 100% right w.r.t. point 1. Frankly, marginal_means() is currently a 2nd class citizen, and I should invest in a rethink. But let’s put marginal_means() aside for now. I want to make sure that you understand the philosophy of the other marginaleffects function.

I break down the scientific workflow into 4 distinct steps:

  1. ESTIMAND. What quantity do you care about? Prediction, comparison, or slope. Note that “comparison” captures a very broad array of statistics, including contrasts, risk ratios, risk differences, log odds ratios, etc. In marginaleffects, a “comparison” is just a function of two predictions.
    • User choice: predictions(), slopes(), or comparisons()
  2. GRID. Where do you want to evaluate the estimand in the predictor space? All those quantities are “conditional”, in the sense that their values depend on the values of the predictors where we evaluate.
    • User choice: newdata argument
  3. AVERAGING. Do you want to “marginalize” or “average” the estimates which were calculated in every cell of the grid? If so, do you want 1 total average, or averages by groups?
    • User choice: avg_*() function and by argument.
  4. HYPOTHESIS. What hypothesis test do you want to run?
    • hypothesis argument

Model

library(emmeans)
library(marginaleffects)

data(iris)
set.seed(123)
iris$dummy <- as.factor(rbinom(150, 1, prob = c(.3, .7)))
iris$categorical <- as.factor(sample(letters[1:3], 150, TRUE))
m <- lm(Sepal.Length ~ Species * dummy + Petal.Width, data = iris)

Step 1: Estimand

What quantity do I care about? In this example, my scientific interest les in predictions, so I use the predictions function:

predictions(m)
# 
#  Estimate Std. Error     z   Pr(>|z|) 2.5 % 97.5 %
#     4.980    0.08791 56.64 < 2.22e-16 4.807  5.152
#     4.980    0.08791 56.64 < 2.22e-16 4.807  5.152
#     4.980    0.08791 56.64 < 2.22e-16 4.807  5.152
#     4.980    0.08791 56.64 < 2.22e-16 4.807  5.152
#     4.937    0.10729 46.02 < 2.22e-16 4.727  5.148
# --- 140 rows omitted. See ?avg_predictions and ?print.marginaleffects --- 
#     6.686    0.11267 59.34 < 2.22e-16 6.465  6.907
#     6.595    0.09266 71.17 < 2.22e-16 6.414  6.777
#     6.403    0.10258 62.42 < 2.22e-16 6.202  6.604
#     6.972    0.10729 64.99 < 2.22e-16 6.762  7.183
#     6.501    0.09875 65.83 < 2.22e-16 6.307  6.694 
# 
# Prediction type:  response 
# Columns: rowid, type, estimate, std.error, statistic, p.value, conf.low, conf.high, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species, dummy, categorical

Predictions are a conditional quantity: their values depend on the values of all the predictions. By default, marginaleffects thus returns one prediction per row of the original dataset.

Step 2: Grid

Where do I want to evaluate the estimand in the preditor space? In this case, I don’t care about the empirical distribution of predictors. Instead, I want to make predictions on a balance grid of categorical predictors, with continuous predictors held at their means:

predictions(m,                                          # Step 1: Estimand
    newdata = datagrid(Species = unique, dummy = 0:1))  # Step 2: Grid
# 
#  Estimate Std. Error     z   Pr(>|z|) 2.5 % 97.5 %    Species dummy
#     5.922    0.20448 28.96 < 2.22e-16 5.521  6.323     setosa     0
#     5.880    0.21529 27.31 < 2.22e-16 5.458  6.302     setosa     1
#     5.809    0.11074 52.45 < 2.22e-16 5.592  6.026 versicolor     0
#     5.821    0.09134 63.73 < 2.22e-16 5.642  6.000 versicolor     1
#     5.934    0.18085 32.81 < 2.22e-16 5.580  6.289  virginica     0
#     5.648    0.19531 28.92 < 2.22e-16 5.265  6.031  virginica     1
# 
# Prediction type:  response 
# Columns: rowid, type, estimate, std.error, statistic, p.value, conf.low, conf.high, Sepal.Length, Petal.Width, Species, dummy

Equivalent to:

emmeans(m, c("Species", "dummy"))
#  Species    dummy emmean     SE  df lower.CL upper.CL
#  setosa     0       5.92 0.2045 143     5.52     6.33
#  versicolor 0       5.81 0.1107 143     5.59     6.03
#  virginica  0       5.93 0.1809 143     5.58     6.29
#  setosa     1       5.88 0.2153 143     5.45     6.31
#  versicolor 1       5.82 0.0913 143     5.64     6.00
#  virginica  1       5.65 0.1953 143     5.26     6.03
# 
# Confidence level used: 0.95

Step 3: Averaging

Do I want to marginalize? If so, I use the avg_*() companion function and the by argument:

avg_predictions(m,                                        # Step 1: Estimand
    newdata = datagrid(Species = unique, dummy = 0:1),    # Step 2: Grid
    by = "Species")                                       # Step 3: Marginalization
# 
#     Species Estimate Std. Error     z   Pr(>|z|) 2.5 % 97.5 %
#      setosa    5.901    0.19826 29.76 < 2.22e-16 5.512  6.290
#  versicolor    5.815    0.07332 79.31 < 2.22e-16 5.671  5.959
#   virginica    5.791    0.17537 33.02 < 2.22e-16 5.447  6.135
# 
# Prediction type:  response 
# Columns: type, Species, estimate, std.error, statistic, p.value, conf.low, conf.high

Equivalent to:

emmeans(m, "Species")
# NOTE: Results may be misleading due to involvement in interactions
#  Species    emmean     SE  df lower.CL upper.CL
#  setosa       5.90 0.1983 143     5.51     6.29
#  versicolor   5.82 0.0733 143     5.67     5.96
#  virginica    5.79 0.1754 143     5.44     6.14
# 
# Results are averaged over the levels of: dummy 
# Confidence level used: 0.95

Step 4: Hypothesis tests

What hypothesis test do I want to run? Pairwise comparisons:

avg_predictions(m,                                        # Step 1: Estimand
    newdata = datagrid(Species = unique, dummy = 0:1),    # Step 2: Grid
    by = "Species",                                       # Step 3: Marginalization
    hypothesis = "pairwise")                              # Step 4: Hypothesis tests
# 
#                    Term Estimate Std. Error      z Pr(>|z|)   2.5 % 97.5 %
#     setosa - versicolor  0.08602     0.2305 0.3732  0.70899 -0.3657 0.5377
#      setosa - virginica  0.11001     0.3607 0.3050  0.76039 -0.5970 0.8170
#  versicolor - virginica  0.02399     0.1697 0.1414  0.88756 -0.3086 0.3566
# 
# Prediction type:  response 
# Columns: type, term, estimate, std.error, statistic, p.value, conf.low, conf.high

Equivalent to:

emmeans(m, "Species") |> contrast("pairwise")
#  contrast               estimate    SE  df t.ratio p.value
#  setosa - versicolor       0.086 0.230 143   0.373  0.9261
#  setosa - virginica        0.110 0.361 143   0.305  0.9500
#  versicolor - virginica    0.024 0.170 143   0.141  0.9890
# 
# Results are averaged over the levels of: dummy 
# P value adjustment: tukey method for comparing a family of 3 estimates

Or take the ratio of the first and second averaged predictions:

avg_predictions(m,                                        # Step 1: Estimand
    newdata = datagrid(Species = unique, dummy = 0:1),    # Step 2: Grid
    by = "Species",                                       # Step 3: Marginalization
    hypothesis = "b1 / b2 = 0")                             # Step 4: Hypothesis tests
# 
#     Term Estimate Std. Error     z   Pr(>|z|)  2.5 % 97.5 %
#  b1/b2=0    1.015    0.03974 25.54 < 2.22e-16 0.9369  1.093
# 
# Prediction type:  response 
# Columns: type, term, estimate, std.error, statistic, p.value, conf.low, conf.high

Does this make sense?

Ok, that makes it clearer, thank you very much! Why do you prefer avg_predictions() over comparison() when making comparisons()?

I could now also figure out following "translations" from emmeans to marginaleffects:

library(emmeans)
library(marginaleffects)

data(iris)
set.seed(123)
iris$dummy <- as.factor(rbinom(150, 1, prob = c(.3, .7)))
iris$categorical <- as.factor(sample(letters[1:3], 150, TRUE))
m <- lm(Sepal.Length ~ Species * dummy + Petal.Width, data = iris)

emmeans(m, "Species", by = "dummy") |> pairs()
#> dummy = 0:
#>  contrast               estimate    SE  df t.ratio p.value
#>  setosa - versicolor      0.1133 0.244 143   0.465  0.8878
#>  setosa - virginica      -0.0121 0.364 143  -0.033  0.9994
#>  versicolor - virginica  -0.1254 0.201 143  -0.624  0.8074
#> 
#> dummy = 1:
#>  contrast               estimate    SE  df t.ratio p.value
#>  setosa - versicolor      0.0587 0.257 143   0.228  0.9718
#>  setosa - virginica       0.2321 0.383 143   0.606  0.8171
#>  versicolor - virginica   0.1734 0.190 143   0.911  0.6340
#> 
#> P value adjustment: tukey method for comparing a family of 3 estimates

avg_predictions(m,
    newdata = datagrid(Species = unique, dummy = 1),
    by = "Species",
    hypothesis = "pairwise")  
#> 
#>                    Term Estimate Std. Error      z Pr(>|z|)   2.5 % 97.5 %
#>     setosa - versicolor  0.05869     0.2574 0.2280  0.81963 -0.4458 0.5631
#>      setosa - virginica  0.23209     0.3830 0.6059  0.54457 -0.5187 0.9828
#>  versicolor - virginica  0.17340     0.1902 0.9115  0.36205 -0.1995 0.5463
#> 
#> Prediction type:  response 
#> Columns: rowid, type, term, estimate, std.error, statistic, p.value, conf.low, conf.high



emmeans(m, c("Species", "dummy")) |> pairs()
#>  contrast                              estimate    SE  df t.ratio p.value
#>  setosa dummy0 - versicolor dummy0       0.1133 0.244 143   0.465  0.9972
#>  setosa dummy0 - virginica dummy0       -0.0121 0.364 143  -0.033  1.0000
#>  setosa dummy0 - setosa dummy1           0.0422 0.138 143   0.306  0.9996
#>  setosa dummy0 - versicolor dummy1       0.1009 0.248 143   0.407  0.9985
#>  setosa dummy0 - virginica dummy1        0.2743 0.376 143   0.729  0.9780
#>  versicolor dummy0 - virginica dummy0   -0.1254 0.201 143  -0.624  0.9891
#>  versicolor dummy0 - setosa dummy1      -0.0711 0.253 143  -0.281  0.9998
#>  versicolor dummy0 - versicolor dummy1  -0.0124 0.140 143  -0.089  1.0000
#>  versicolor dummy0 - virginica dummy1    0.1610 0.214 143   0.754  0.9746
#>  virginica dummy0 - setosa dummy1        0.0543 0.371 143   0.146  1.0000
#>  virginica dummy0 - versicolor dummy1    0.1130 0.177 143   0.638  0.9879
#>  virginica dummy0 - virginica dummy1     0.2864 0.137 143   2.095  0.2958
#>  setosa dummy1 - versicolor dummy1       0.0587 0.257 143   0.228  0.9999
#>  setosa dummy1 - virginica dummy1        0.2321 0.383 143   0.606  0.9905
#>  versicolor dummy1 - virginica dummy1    0.1734 0.190 143   0.911  0.9430
#> 
#> P value adjustment: tukey method for comparing a family of 6 estimates

avg_predictions(m,
  newdata = datagrid(Species = unique, dummy = 0:1),
  by = c("Species", "dummy"),
  hypothesis = "pairwise"
)
#> 
#>                         Term Estimate Std. Error        z Pr(>|z|)    2.5 %
#>          setosa,0 - setosa,1  0.04223     0.1382  0.30568 0.759850 -0.22856
#>      setosa,0 - versicolor,0  0.11335     0.2438  0.46496 0.641960 -0.36445
#>      setosa,0 - versicolor,1  0.10092     0.2482  0.40668 0.684242 -0.38546
#>       setosa,0 - virginica,0 -0.01207     0.3639 -0.03317 0.973539 -0.72530
#>       setosa,0 - virginica,1  0.27432     0.3762  0.72919 0.465888 -0.46303
#>      setosa,1 - versicolor,0  0.07111     0.2530  0.28104 0.778676 -0.42482
#>      setosa,1 - versicolor,1  0.05869     0.2574  0.22802 0.819631 -0.44577
#>       setosa,1 - virginica,0 -0.05430     0.3709 -0.14641 0.883599 -0.78127
#>       setosa,1 - virginica,1  0.23209     0.3830  0.60591 0.544571 -0.51866
#>  versicolor,0 - versicolor,1 -0.01243     0.1404 -0.08850 0.929478 -0.28760
#>   versicolor,0 - virginica,0 -0.12542     0.2011 -0.62370 0.532826 -0.51954
#>   versicolor,0 - virginica,1  0.16098     0.2135  0.75396 0.450875 -0.25750
#>   versicolor,1 - virginica,0 -0.11299     0.1771 -0.63797 0.523495 -0.46012
#>   versicolor,1 - virginica,1  0.17340     0.1902  0.91147 0.362048 -0.19947
#>    virginica,0 - virginica,1  0.28639     0.1367  2.09498 0.036173  0.01846
#>  97.5 %
#>  0.3130
#>  0.5911
#>  0.5873
#>  0.7012
#>  1.0117
#>  0.5670
#>  0.5631
#>  0.6727
#>  0.9828
#>  0.2628
#>  0.2687
#>  0.5795
#>  0.2341
#>  0.5463
#>  0.5543
#> 
#> Prediction type:  response 
#> Columns: type, term, estimate, std.error, statistic, p.value, conf.low, conf.high

Created on 2023-02-03 with reprex v2.0.2

Why do you prefer avg_predictions() over comparison() when making comparisons()?

I don’t! Normally, I would use comparisons() for this, but just wanted to illustrate a “ground-up” approach to the 4 steps.

The killer feature of marginaleffects is that it is entirely “composable”. So you can use the same 4 steps everywhere. For example, if you use comparisons() and hypothesis="pairwise", you get a difference in differences:

avg_comparisons(m, hypothesis = "pairwise")
#                                          Term Estimate Std. Error      z
#  (versicolor - setosa) - (virginica - setosa)   -0.652      0.103 -6.333
#    Pr(>|z|)   2.5 %  97.5 %
#  2.4093e-10 -0.8538 -0.4502
# 
# Prediction type:  response 
# Columns: type, term, estimate, std.error, statistic, p.value, conf.low, conf.high

But also, comparisons() allows you to do a bunch of more power things, like risk ratios, or odds, or pairwise differences in odds, or totally custom comparisons between predictions.