ngreifer/WeightIt

PSs are different if the covariates are specified in different orders on the right-hand side of the formula when method = "cbps"

danielebottigliengo opened this issue · 2 comments

Hi!

Thanks for the great package!

I have noticed that when method = "cbps" is used in the weightit function different PSs values are returned if the covariates on the right-hand side of the formula are specified in different orders. However, if PSs are estimated "outside" of the weightit function using CBPS::CBPS they look identical. Here's a reproducible example using lalonde dataset.

library(WeightIt)
library(CBPS)
#> Loading required package: MASS
#> Loading required package: MatchIt
#> Loading required package: nnet
#> Loading required package: numDeriv
#> Loading required package: glmnet
#> Loading required package: Matrix
#> Loaded glmnet 4.0-2
#> CBPS: Covariate Balancing Propensity Score
#> Version: 0.21
#> Authors: Christian Fong [aut, cre],
#>   Marc Ratkovic [aut],
#>   Kosuke Imai [aut],
#>   Chad Hazlett [ctb],
#>   Xiaolin Yang [ctb],
#>   Sida Peng [ctb]

data("lalonde")

# 1) PS estimation using CBPS function ---------------------------------
ps_1 <- CBPS(
  treat ~ age + educ + race + married + nodegree + re74 + re75,
  data = lalonde, ATT = 0
)

ps_2 <- CBPS(
  treat ~ educ + race + nodegree + re74 + re75 + married + age,
  data = lalonde, ATT = 0
)

head(data.frame(
  ps1 = ps_1$fitted.values,
  ps2 = ps_2$fitted.values
))
#>         ps1       ps2
#> 1 0.6283220 0.6283220
#> 2 0.2217905 0.2217905
#> 3 0.6629598 0.6629598
#> 4 0.7595289 0.7595289
#> 5 0.6939144 0.6939144
#> 6 0.6906981 0.6906981

# The PSs are identical

# 2) PS estimation using weightit --------------------------------------
wt_1 <- weightit(
  treat ~ age + educ + race + married + nodegree + re74 + re75,
  data = lalonde,
  method = "cbps",
  estimand = "ATE"
)

wt_2 <- weightit(
  treat ~ educ + race + nodegree + re74 + re75 + married + age,
  data = lalonde,
  method = "cbps",
  estimand = "ATE"
)

head(data.frame(
  ps1 = wt_1$ps,
  ps2 = wt_2$ps
))
#>         ps1       ps2
#> 1 0.6897437 0.5851259
#> 2 0.2157126 0.2275898
#> 3 0.6599305 0.7606147
#> 4 0.7763387 0.7580954
#> 5 0.6329184 0.6942782
#> 6 0.6930129 0.6907725

# The PSs are not identical

Created on 2020-12-28 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 4.0.3 (2020-10-10)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  Italian_Italy.1252          
#>  ctype    Italian_Italy.1252          
#>  tz       Europe/Berlin               
#>  date     2020-12-28                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version    date       lib source                            
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.0.0)                    
#>  backports     1.2.1      2020-12-09 [1] CRAN (R 4.0.3)                    
#>  callr         3.5.1      2020-10-13 [1] CRAN (R 4.0.3)                    
#>  CBPS        * 0.21       2019-08-21 [1] CRAN (R 4.0.0)                    
#>  cli           2.2.0      2020-11-20 [1] CRAN (R 4.0.3)                    
#>  codetools     0.2-16     2018-12-24 [2] CRAN (R 4.0.3)                    
#>  colorspace    2.0-0      2020-11-11 [1] CRAN (R 4.0.3)                    
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 4.0.0)                    
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 4.0.0)                    
#>  devtools      2.3.2      2020-09-18 [1] CRAN (R 4.0.2)                    
#>  digest        0.6.27     2020-10-24 [1] CRAN (R 4.0.3)                    
#>  dplyr         1.0.2      2020-08-18 [1] CRAN (R 4.0.2)                    
#>  ellipsis      0.3.1      2020-05-15 [1] CRAN (R 4.0.0)                    
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 4.0.0)                    
#>  fansi         0.4.1      2020-01-08 [1] CRAN (R 4.0.0)                    
#>  foreach       1.5.1      2020-10-15 [1] CRAN (R 4.0.3)                    
#>  fs            1.5.0      2020-07-31 [1] CRAN (R 4.0.2)                    
#>  generics      0.1.0      2020-10-31 [1] CRAN (R 4.0.3)                    
#>  ggplot2       3.3.2      2020-06-19 [1] CRAN (R 4.0.2)                    
#>  glmnet      * 4.0-2      2020-06-16 [1] CRAN (R 4.0.2)                    
#>  glue          1.4.2      2020-08-27 [1] CRAN (R 4.0.2)                    
#>  gtable        0.3.0      2019-03-25 [1] CRAN (R 4.0.0)                    
#>  highr         0.8        2019-03-20 [1] CRAN (R 4.0.0)                    
#>  htmltools     0.5.0      2020-06-16 [1] CRAN (R 4.0.2)                    
#>  iterators     1.0.13     2020-10-15 [1] CRAN (R 4.0.3)                    
#>  knitr         1.30       2020-09-22 [1] CRAN (R 4.0.2)                    
#>  lattice       0.20-41    2020-04-02 [2] CRAN (R 4.0.3)                    
#>  lifecycle     0.2.0      2020-03-06 [1] CRAN (R 4.0.0)                    
#>  magrittr      2.0.1      2020-11-17 [1] CRAN (R 4.0.3)                    
#>  MASS        * 7.3-53     2020-09-09 [2] CRAN (R 4.0.3)                    
#>  MatchIt     * 4.1.0      2020-12-15 [1] CRAN (R 4.0.3)                    
#>  Matrix      * 1.2-18     2019-11-27 [2] CRAN (R 4.0.3)                    
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 4.0.0)                    
#>  munsell       0.5.0      2018-06-12 [1] CRAN (R 4.0.0)                    
#>  nnet        * 7.3-14     2020-04-26 [2] CRAN (R 4.0.3)                    
#>  numDeriv    * 2016.8-1.1 2019-06-06 [1] CRAN (R 4.0.0)                    
#>  pillar        1.4.7      2020-11-20 [1] CRAN (R 4.0.3)                    
#>  pkgbuild      1.2.0      2020-12-15 [1] CRAN (R 4.0.3)                    
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.0.0)                    
#>  pkgload       1.1.0      2020-05-29 [1] CRAN (R 4.0.0)                    
#>  prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.0.0)                    
#>  processx      3.4.5      2020-11-30 [1] CRAN (R 4.0.3)                    
#>  ps            1.5.0      2020-12-05 [1] CRAN (R 4.0.3)                    
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.0.0)                    
#>  R6            2.5.0      2020-10-28 [1] CRAN (R 4.0.3)                    
#>  Rcpp          1.0.5      2020-07-06 [1] CRAN (R 4.0.2)                    
#>  remotes       2.2.0      2020-07-21 [1] CRAN (R 4.0.2)                    
#>  rlang         0.4.9      2020-11-26 [1] CRAN (R 4.0.3)                    
#>  rmarkdown     2.6        2020-12-14 [1] CRAN (R 4.0.3)                    
#>  rprojroot     2.0.2      2020-11-15 [1] CRAN (R 4.0.3)                    
#>  scales        1.1.1      2020-05-11 [1] CRAN (R 4.0.0)                    
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 4.0.0)                    
#>  shape         1.4.5      2020-09-13 [1] CRAN (R 4.0.2)                    
#>  stringi       1.5.3      2020-09-09 [1] CRAN (R 4.0.2)                    
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.0.0)                    
#>  survival      3.2-7      2020-09-28 [2] CRAN (R 4.0.3)                    
#>  testthat      3.0.1      2020-12-17 [1] CRAN (R 4.0.3)                    
#>  tibble        3.0.4      2020-10-12 [1] CRAN (R 4.0.3)                    
#>  tidyselect    1.1.0      2020-05-11 [1] CRAN (R 4.0.0)                    
#>  usethis       2.0.0      2020-12-10 [1] CRAN (R 4.0.3)                    
#>  vctrs         0.3.6      2020-12-17 [1] CRAN (R 4.0.3)                    
#>  WeightIt    * 0.10.2     2020-08-21 [1] Github (ngreifer/WeightIt@7811149)
#>  withr         2.3.0      2020-09-22 [1] CRAN (R 4.0.3)                    
#>  xfun          0.19       2020-10-30 [1] CRAN (R 4.0.3)                    
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.0.0)                    
#> 
#> [1] C:/Users/Daniele/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.3/library

I also tried using method = "ps" and method = "gbm", but I didn't notice any differences in the PSs values. I wasn't expecting different PSs values when covariates are specified in different orders, but maybe there is something that I am missing.

Thank you very much!

Thank you so much for letting me know about this! It should e fixed now.

The problem was due to WegithIt splitting the race factor variable into three categories and supplying all of them to CBPS(), which meant the design matrix was not full rank, leading to erratic behavior. You can actually replicate this using CBPS() in the following way:

lalonde_split <- cobalt::splitfactor(lalonde, drop.first = FALSE)

ps_1 <- CBPS(
  treat ~ age + educ + race_black + race_hispan + race_white + married + nodegree + re74 + re75,
  data = lalonde_split, ATT = 0
)

ps_2 <- CBPS(
  treat ~ educ + race_black + race_hispan + race_white + nodegree + re74 + re75 + married + age,
  data = lalonde_split, ATT = 0
)

head(data.frame(
  ps1 = ps_1$fitted.values,
  ps2 = ps_2$fitted.values
))

I've fixed weightit() so that it ensures the matrix is full rank, yielding the same propensity scores every time.

Thank you so much @ngreifer for your quick answer!