vincentarelbundock/marginaleffects

Incorrect results in "Predictions" vignette?

Closed this issue · 6 comments

In the "Predictions" vignette, you say:

In the next example, we create a “counterfactual” data grid where each observation of the dataset is repeated twice, with different values of the am variable, and all other variables held at the observed values. We also show the equivalent results using dplyr

But the results are actually not the same:

suppressPackageStartupMessages({
  library(dplyr)
  library(marginaleffects)
})
packageVersion("marginaleffects")
#> [1] '0.9.9101'

mod <- glm(vs ~ hp + am, data = mtcars, family = binomial)

predictions(
  mod,
  by = "am",
  newdata = datagridcf(am = 0:1))
#> 
#>  am Estimate Pr(>|z|)    2.5 % 97.5 %
#>   0  0.24043   0.3922 2.22e-02  0.815
#>   1  0.00696   0.0359 6.81e-05  0.419
#> 
#> Prediction type:  link 
#> Columns: type, am, estimate, p.value, conf.low, conf.high

predictions(
  mod,
  newdata = datagridcf(am = 0:1)) |>
  group_by(am) |>
  summarize(AAP = mean(estimate))
#> # A tibble: 2 × 2
#>      am   AAP
#>   <int> <dbl>
#> 1     0 0.526
#> 2     1 0.330

Note that this difference is not present in 0.9.0

I don't know exactly the cause of it but the results are wrong since 0583554 (11 Feb.)

Thanks.

This is because of backtransformation. I think the change is desirable, and it is now documented in the main predictions() docs, as well as in the section just above the one you link to.

I modified the vignette as shown here: c5597ea

I don't know exactly the cause of it but the results are wrong since 0583554 (11 Feb.)

yeah, that was a big change. I dropped the use of insight::get_predicted() altogether. Prediction is such an integral part of marginaleffects that we need to handle it internally completely.

great, thank you for the clarification

Just a quick question/comment: it seems that it is inducing an inconsistency with avg_comparison(). Before the change, the results of avg_comparisons() where consistent with avg_predictions(). Now it is only consistent with avg_predictions(type="response").

library(marginaleffects)

mtcars$gear <- factor(mtcars$gear)
mod <- glm(vs ~ gear + mpg, data = mtcars, family = binomial)

comp <- avg_comparisons(mod)
comp
#> 
#>  Term Contrast Estimate Std. Error      z Pr(>|z|)   2.5 % 97.5 %
#>  gear    4 - 3   0.0372     0.1366  0.272    0.785 -0.2305  0.305
#>  gear    5 - 3  -0.3397     0.0988 -3.437   <0.001 -0.5334 -0.146
#>  mpg     +1      0.0608     0.0128  4.736   <0.001  0.0356  0.086
#> 
#> Prediction type:  response 
#> Columns: type, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high

pred <- avg_predictions(mod, variables = "gear", by = "gear", type = "response")
pred
#> 
#>  gear Estimate Std. Error    z Pr(>|z|)  2.5 % 97.5 %
#>     3    0.473     0.0830 5.70   <0.001 0.3101  0.635
#>     4    0.510     0.0992 5.14   <0.001 0.3156  0.704
#>     5    0.133     0.0520 2.56   0.0105 0.0311  0.235
#> 
#> Prediction type:  response 
#> Columns: type, gear, estimate, std.error, statistic, p.value, conf.low, conf.high
pred$estimate[2] - pred$estimate[1]
#> [1] 0.03717842
comp$estimate[1]
#> [1] 0.03717842

pred <- avg_predictions(mod, variables = "gear", by = "gear")
pred
#> 
#>  gear Estimate Pr(>|z|)    2.5 % 97.5 %
#>     3  0.56493    0.782 1.70e-01  0.892
#>     4  0.65704    0.495 2.28e-01  0.925
#>     5  0.00386    0.061 1.16e-05  0.564
#> 
#> Prediction type:  link 
#> Columns: type, gear, estimate, p.value, conf.low, conf.high
pred$estimate[2] - pred$estimate[1]
#> [1] 0.09210965
comp$estimate[1]
#> [1] 0.03717842

Created on 2023-02-24 with reprex v2.0.2

Just a quick question/comment: it seems that it is inducing an inconsistency with avg_comparison(). Before the change, the results of avg_comparisons() where consistent with avg_predictions(). Now it is only consistent with avg_predictions(type="response").

Yes, this is inconsistent.

On the one hand, I think there are good reasons to backtransform the predictions, because the resulting estimates have nice statistical properties. On the other hand, I am not aware of a good "general" solution to apply the same kind of backtransformation to arbitrary functions of predictions like the ones estimated by comparisons(). So the desiderata of (a) good statistical properties, and (b) user interface consistency enter in conflict.

I've chosen (a) but recognize that there are downsides to this.