leeper/prediction

standard errors using prediction() in R vs margins in Stata

sgenter1 opened this issue · 0 comments

  • a question about package functionality

Hi,

When using the prediction function to calculate predicted means (or "margins") I get the same predicted values in both R and Stata, but different standard errors (and different confidence intervals as a result). What is the reason for the difference? Am I calculating the standard errors incorrectly below with my call to summarize (se = mean(se.fitted)). Or, possibly, is Stata calculating the errors incorrectly?

Ideally, I would be able to reproduce the same results in any software package. I would greatly appreciate any advice for how to replicate the results Stata gives in R.

suppressWarnings(suppressPackageStartupMessages({
  library(tidyverse)
  library(prediction)
}))
  
# model
mod <- lm(mpg ~ cyl + vs + hp, data = mtcars)

# predicted values
pred <- prediction(mod, at = list(cyl = c(4, 6, 8)))

pred %>% 
  group_by(cyl) %>% 
  summarise(pred_mean = mean(fitted) ,
            se = mean(se.fitted),
            ci_low = pred_mean - (1.96 * se),
            ci_high = pred_mean + (1.96 * se),
            total_n = n()) %>% 
  as.data.frame() # to carry out decimals further for printing
#>   cyl pred_mean       se   ci_low  ci_high total_n
#> 1   4  25.60488 1.907449 21.86628 29.34348      32
#> 2   6  20.56328 1.415614 17.78867 23.33788      32
#> 3   8  15.52167 1.771571 12.04939 18.99395      32


# stata output, and code to replicate, below:

# library(haven)

# mtcars %>%
#   write_dta("mtcars.dta"))
# 
# use mtcars, clear
# reg mpg vs cyl hp
# margins, at(cyl = (4, 6, 8))
# 
# Predictive margins                              Number of obs     =         32
# Model VCE    : OLS
# 
# Expression   : Linear prediction, predict()
# 
# 1._at        : cyl             =           4
# 
# 2._at        : cyl             =           6
# 
# 3._at        : cyl             =           8
# 
# |            Delta-method
# |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
#_at |
# 1  |   25.60488   1.619802    15.81   0.000     22.28687     28.9229
# 2  |   20.56328   .5809866    35.39   0.000     19.37318    21.75337
# 3  |   15.52167   1.379056    11.26   0.000      12.6968    18.34654

Created on 2022-03-16 by the reprex package (v2.0.0)