vincentarelbundock/marginaleffects

System fit coefficient names are not used if two equations share coefficient names

Closed this issue · 6 comments

Relates to #1233

library(car)
library(systemfit)
library(marginaleffects)

set.seed(12345)
N <- 100
C <- matrix(c(1, .3, .3,
              .3,  1, .3,
              .3, .3,  1), 3, 3)
colnames(C) <- rownames(C) <- c("x1", "x2", "x3")
data <- as.data.frame(mvrnorm(N, rep(0, 3), C))
sys <- with(data, {
  y1 <- 2 * x1 + 3 * x2 + x3 + 10 * rnorm(N)
  m1 <- lm(y1 ~ x1 + x2 + x3)
  y2 <- x1 + 3 * x2 + 2 * x3 + 10 * rnorm(N)
  m2 <- lm(y2 ~ x1 + x2 + x3)
  systemfit(list(formula(m1), formula(m2)), data = data)
})
h <- hypotheses(sys)

# This works
hypotheses(sys, "b3 = b7")

# This works
linearHypothesis(sys, "eq1_x2 = eq2_x2")

# This does not work
hypotheses(sys, "eq1_x2 = eq2_x2")

Hypothesis seems to work only with parameter indices, not parameter names. This should not be the case because using parameter indices is error-prone and does not make the intent of the code clear.

systemfit objects just need a get_coef() method for this to work. Alternatively, we could modify the default method of get_coef() to paste in the Component column of the insight::get_parameters() output when it is present.

For now,

get_coef.systemfit <- function (model, ...) {
    out <- insight::get_parameters(model, component = "all")
    out <- stats::setNames(out$Estimate, paste(out$Component, out$Parameter, sep = "_"))
    return(out)
}

should do the trick. You can run this interactively to get better labels before this is implemented in the package.

Oh, it looks like Vincent already fixed this in 9496606. Installing the development version should fix it.

Ooops, yes, sorry I should have closed this issue. Thanks @mronkko for the report and sorry I made you waste time investigating, @ngreifer

I do not think that this has been fully addressed. Using my example, get_coef produces names with equation numbers

> get_coef(sys)
eq1_(Intercept)          eq1_x1          eq1_x2          eq1_x3 eq2_(Intercept) 
      1.7242089       2.5062010       4.4522016      -2.3640198      -0.4534193 
         eq2_x1          eq2_x2          eq2_x3 
     -0.4228978       3.4664965       2.6820931 

However, If I run hypotheses without hypothesis argument, the equation names are missing:

> hypotheses(sys)

        Term Estimate Std. Error      z Pr(>|z|)    S  2.5 % 97.5 %
 (Intercept)    1.724      0.952  1.811  0.07014  3.8 -0.142   3.59
 x1             2.506      1.025  2.445  0.01449  6.1  0.497   4.52
 x2             4.452      1.079  4.125  < 0.001 14.7  2.337   6.57
 x3            -2.364      0.966 -2.447  0.01442  6.1 -4.258  -0.47
 (Intercept)   -0.453      0.911 -0.498  0.61867  0.7 -2.239   1.33
 x1            -0.423      0.981 -0.431  0.66634  0.6 -2.345   1.50
 x2             3.466      1.033  3.356  < 0.001 10.3  1.442   5.49
 x3             2.682      0.925  2.901  0.00372  8.1  0.870   4.49

Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 

Using the coefficient names from get_coef(sys) in a hypothesis still produces an error:

> hypotheses(sys, "eq1_x2 = eq2_x2")
Error: To use term names in a `hypothesis` string, the same function call without
  `hypothesis` argument must produce a `term` column with unique row identifiers. You
  can use `b1`, `b2`, etc. indices instead of term names in the `hypotheses` string Ex:
  "b1 + b2 = 0" Alternatively, you can use the `newdata`, `variables`, or `by`
  arguments:
  
  mod <- lm(mpg ~ am * vs + cyl, data = mtcars)
  comparisons(mod, newdata = "mean", hypothesis = "b1 = b2")
  comparisons(mod, newdata = "mean", hypothesis = "am = vs")
  comparisons(mod, variables = "am", by = "cyl", hypothesis = "pairwise")

Here is my session info

> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.0.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Helsinki
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] texreg_1.39.4            systemfit_1.1-30         lmtest_0.9-40           
 [4] zoo_1.8-12               Matrix_1.7-0             marginaleffects_0.23.0.2
 [7] MASS_7.3-61              lattice_0.22-6           car_3.1-3               
[10] carData_3.0-5            lubridate_1.9.3          forcats_1.0.0           
[13] stringr_1.5.1            dplyr_1.1.4              purrr_1.0.2             
[16] readr_2.1.5              tidyr_1.3.1              tibble_3.2.1            
[19] ggplot2_3.5.1            tidyverse_2.0.0         

loaded via a namespace (and not attached):
 [1] sandwich_3.1-1    utf8_1.2.4        generics_0.1.3    stringi_1.8.4     hms_1.1.3        
 [6] magrittr_2.0.3    grid_4.4.1        timechange_0.3.0  backports_1.5.0   Formula_1.2-5    
[11] httr_1.4.7        fansi_1.0.6       scales_1.3.0      abind_1.4-8       cli_3.6.3        
[16] rlang_1.1.4       munsell_0.5.1     withr_3.0.1       tools_4.4.1       tzdb_0.4.0       
[21] checkmate_2.3.2   colorspace_2.1-1  vctrs_0.6.5       R6_2.5.1          lifecycle_1.0.4  
[26] insight_0.20.5.1  pkgconfig_2.0.3   pillar_1.9.0      gtable_0.3.5      glue_1.8.0       
[31] data.table_1.16.2 Rcpp_1.0.13       tidyselect_1.2.1  rstudioapi_0.16.0 compiler_4.4.1  

@mronkko can you try the latest github version from this morning?

This works with the current version. Thanks!