leeper/margins

margins() ignores use of I() inside formula

lang-benjamin opened this issue · 11 comments

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

Using I() inside a formula does not work, margins() ignores it. As an example, let's do a linear regression of mpg vs. hp from the mtcars data and re-scale hp.

library(margins)
# re-scale predictor variable hp manually works
mtcars2 <- mtcars
mtcars2$hp <- mtcars2$hp / 50
margins(lm(mpg ~ hp, data = mtcars2))
#> Average marginal effects
#> lm(formula = mpg ~ hp, data = mtcars2)
#>      hp
#>  -3.411

# re-scale within formula does not work:
# margins() ignores re-scaling via I(hp / 50) and produces same result as in
# margins(lm(mpg ~ hp, data = mtcars))
ame_scaled <- margins(lm(mpg ~ I(hp / 50), data = mtcars))
ame_scaled
#> Average marginal effects
#> lm(formula = mpg ~ I(hp/50), data = mtcars)
#>        hp
#>  -0.06823
all.equal(
  summary(ame_scaled)$AME[[1]],
  summary(margins(lm(mpg ~ hp, data = mtcars)))$AME[[1]]
)
#> [1] TRUE

Created on 2020-06-21 by the reprex package (v0.3.0)

Session info
sessionInfo()
#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS High Sierra 10.13.6
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] margins_0.3.23
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.4.6      digest_0.6.25     MASS_7.3-51.6     magrittr_1.5     
#>  [5] evaluate_0.14     highr_0.8         rlang_0.4.6       stringi_1.4.6    
#>  [9] data.table_1.12.8 rmarkdown_2.1     prediction_0.3.14 tools_4.0.0      
#> [13] stringr_1.4.0     xfun_0.14         yaml_2.2.1        compiler_4.0.0   
#> [17] htmltools_0.4.0   knitr_1.28

This is by design. margins() reports effects at the scale of the original variable, not the term generated by I().

Thanks for the quick reply! Let me ask one more question about the general principle behind this. So, how about other transformations like log(hp), stats::poly(hp, 2) or splines::ns(hp)? The first two are not ignored by margins() whereas the last one will again be ignored.

Thanks. I'll look into that more - the intention is for it to report based on original variable. If that's not happening, it's probably a bug.

Thanks. Sorry again, where is this intention coming from? That essentially means margins does not allow for non-linear modelling/transformation. Are there specific obstacles when calculating marginal effects for such transformations?

Obviously when it comes to non-linear transformation marginal effect at the scale of the original variable is no longer constant across different values of the original predictor.

Thanks for the quick reply! Let me ask one more question about the general principle behind this. So, how about other transformations like log(hp), stats::poly(hp, 2) or splines::ns(hp)? The first two are not ignored by margins() whereas the last one will again be ignored.

Just adding a comment here, because I was confused by this issue. In my testing, all of these transformations are reported on the scale of the original predictor (which is also what I would have expected):

library(margins)
margins(lm(mpg ~ wt + I(wt^2), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ wt + I(wt^2), data = mtcars)
#>      wt
#>  -5.845
margins(lm(mpg ~ stats::poly(wt, 2), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ stats::poly(wt, 2), data = mtcars)
#>      wt
#>  -5.845
margins(lm(mpg ~ log(wt), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ log(wt), data = mtcars)
#>      wt
#>  -5.855
margins(lm(mpg ~ splines::ns(wt), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ splines::ns(wt), data = mtcars)
#>      wt
#>  -5.344
margins(lm(mpg ~ splines::ns(wt, 5), data = mtcars))
#> Average marginal effects
#> lm(formula = mpg ~ splines::ns(wt, 5), data = mtcars)
#>      wt
#>  -5.931

Created on 2021-04-22 by the reprex package (v1.0.0)

By the way - however perhaps this is not the best place to talk about it - rather unexpected consequence of margins() reporting effects for variables on their original scale is that it is - in some situations - sensitive to change in the data, that was used to estimate the model. See:

library(margins)
# model without within-formula data transformations
m1 <- lm(mpg ~ wt, data = mtcars)
# model with some within-formula data transformation
m2 <- lm(mpg ~ I(wt^2), data = mtcars)
margins(m1)
#> Average marginal effects
#> lm(formula = mpg ~ wt, data = mtcars)
#>      wt
#>  -5.344
margins(m2)
#> Average marginal effects
#> lm(formula = mpg ~ I(wt^2), data = mtcars)
#>      wt
#>  -4.542

# now let's change the dataset
mtcars <- subset(mtcars, carb < 4)

# and let's call margins() on the (unaffected!) models once more

# for model with no  within-formula data transformations we get the same results
margins(m1)
#> Average marginal effects
#> lm(formula = mpg ~ wt, data = mtcars)
#>      wt
#>  -5.344

# but for model with some within-formula data transformations results are different!
margins(m2)
#> Average marginal effects
#> lm(formula = mpg ~ I(wt^2), data = mtcars)
#>      wt
#>  -4.068

Created on 2021-04-22 by the reprex package (v2.0.0)

That's because in the latter model there is no way to get the original variable other than turn back to the dataset (there is no untransformed data in the a model object) and consequently margins() works on the dataset in its current state. On the other hand, with no within-formula data transformation margins() can rely on a model frame that is stored in a model object itself to perform all the computations, so these results stands unaffected.

That's an interesting find, and I think it should be included in the documentation somewhere, if it isn't yet.

For the specific case you describe (subsetting), margins could even figure out that the data was changed, because the design matrix has fewer rows than the updated mtcars data. I think that margins should throw an error in that case. Of course, it would not be possible to detect other changes in the original data.

I think it would be worth creating a new issue for this problem.

It should be possible to detect any changes (that affect results) by comparing model frame from the model object with model frame constructed using data object in its current state and the model call. However detecting such changes perhaps should result in warning instead of error.

Yes, that would be a pretty expensive operation for large models, though, and you would have to do it for every margins() call. I definitely think that this should be addressed in some way.

Just adding a comment here, because I was confused by this issue. In my testing, all of these transformations are reported on the scale of the original predictor (which is also what I would have expected):
...

Thanks for the answer! I realize that I misunderstood the initial answer. I thought that the intention was to ignore usage of I() or any other transformation. It is however not ignored, just the final reporting is then in terms of the original variable (which makes sense).