PSs are different if the covariates are specified in different orders on the right-hand side of the formula when method = "cbps"
danielebottigliengo opened this issue · 2 comments
Hi!
Thanks for the great package!
I have noticed that when method = "cbps"
is used in the weightit
function different PSs values are returned if the covariates on the right-hand side of the formula are specified in different orders. However, if PSs are estimated "outside" of the weightit
function using CBPS::CBPS
they look identical. Here's a reproducible example using lalonde
dataset.
library(WeightIt)
library(CBPS)
#> Loading required package: MASS
#> Loading required package: MatchIt
#> Loading required package: nnet
#> Loading required package: numDeriv
#> Loading required package: glmnet
#> Loading required package: Matrix
#> Loaded glmnet 4.0-2
#> CBPS: Covariate Balancing Propensity Score
#> Version: 0.21
#> Authors: Christian Fong [aut, cre],
#> Marc Ratkovic [aut],
#> Kosuke Imai [aut],
#> Chad Hazlett [ctb],
#> Xiaolin Yang [ctb],
#> Sida Peng [ctb]
data("lalonde")
# 1) PS estimation using CBPS function ---------------------------------
ps_1 <- CBPS(
treat ~ age + educ + race + married + nodegree + re74 + re75,
data = lalonde, ATT = 0
)
ps_2 <- CBPS(
treat ~ educ + race + nodegree + re74 + re75 + married + age,
data = lalonde, ATT = 0
)
head(data.frame(
ps1 = ps_1$fitted.values,
ps2 = ps_2$fitted.values
))
#> ps1 ps2
#> 1 0.6283220 0.6283220
#> 2 0.2217905 0.2217905
#> 3 0.6629598 0.6629598
#> 4 0.7595289 0.7595289
#> 5 0.6939144 0.6939144
#> 6 0.6906981 0.6906981
# The PSs are identical
# 2) PS estimation using weightit --------------------------------------
wt_1 <- weightit(
treat ~ age + educ + race + married + nodegree + re74 + re75,
data = lalonde,
method = "cbps",
estimand = "ATE"
)
wt_2 <- weightit(
treat ~ educ + race + nodegree + re74 + re75 + married + age,
data = lalonde,
method = "cbps",
estimand = "ATE"
)
head(data.frame(
ps1 = wt_1$ps,
ps2 = wt_2$ps
))
#> ps1 ps2
#> 1 0.6897437 0.5851259
#> 2 0.2157126 0.2275898
#> 3 0.6599305 0.7606147
#> 4 0.7763387 0.7580954
#> 5 0.6329184 0.6942782
#> 6 0.6930129 0.6907725
# The PSs are not identical
Created on 2020-12-28 by the reprex package (v0.3.0)
Session info
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.0.3 (2020-10-10)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Italian_Italy.1252
#> ctype Italian_Italy.1252
#> tz Europe/Berlin
#> date 2020-12-28
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3)
#> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.3)
#> CBPS * 0.21 2019-08-21 [1] CRAN (R 4.0.0)
#> cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.3)
#> codetools 0.2-16 2018-12-24 [2] CRAN (R 4.0.3)
#> colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.3)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.2)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
#> dplyr 1.0.2 2020-08-18 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> foreach 1.5.1 2020-10-15 [1] CRAN (R 4.0.3)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3)
#> ggplot2 3.3.2 2020-06-19 [1] CRAN (R 4.0.2)
#> glmnet * 4.0-2 2020-06-16 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2)
#> iterators 1.0.13 2020-10-15 [1] CRAN (R 4.0.3)
#> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.2)
#> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.3)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
#> MASS * 7.3-53 2020-09-09 [2] CRAN (R 4.0.3)
#> MatchIt * 4.1.0 2020-12-15 [1] CRAN (R 4.0.3)
#> Matrix * 1.2-18 2019-11-27 [2] CRAN (R 4.0.3)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0)
#> nnet * 7.3-14 2020-04-26 [2] CRAN (R 4.0.3)
#> numDeriv * 2016.8-1.1 2019-06-06 [1] CRAN (R 4.0.0)
#> pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.3)
#> pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.3)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.5 2020-11-30 [1] CRAN (R 4.0.3)
#> ps 1.5.0 2020-12-05 [1] CRAN (R 4.0.3)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
#> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> rlang 0.4.9 2020-11-26 [1] CRAN (R 4.0.3)
#> rmarkdown 2.6 2020-12-14 [1] CRAN (R 4.0.3)
#> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.3)
#> scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> shape 1.4.5 2020-09-13 [1] CRAN (R 4.0.2)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> survival 3.2-7 2020-09-28 [2] CRAN (R 4.0.3)
#> testthat 3.0.1 2020-12-17 [1] CRAN (R 4.0.3)
#> tibble 3.0.4 2020-10-12 [1] CRAN (R 4.0.3)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0)
#> usethis 2.0.0 2020-12-10 [1] CRAN (R 4.0.3)
#> vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.3)
#> WeightIt * 0.10.2 2020-08-21 [1] Github (ngreifer/WeightIt@7811149)
#> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.3)
#> xfun 0.19 2020-10-30 [1] CRAN (R 4.0.3)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] C:/Users/Daniele/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.3/library
I also tried using method = "ps"
and method = "gbm"
, but I didn't notice any differences in the PSs values. I wasn't expecting different PSs values when covariates are specified in different orders, but maybe there is something that I am missing.
Thank you very much!
Thank you so much for letting me know about this! It should e fixed now.
The problem was due to WegithIt
splitting the race
factor variable into three categories and supplying all of them to CBPS()
, which meant the design matrix was not full rank, leading to erratic behavior. You can actually replicate this using CBPS()
in the following way:
lalonde_split <- cobalt::splitfactor(lalonde, drop.first = FALSE)
ps_1 <- CBPS(
treat ~ age + educ + race_black + race_hispan + race_white + married + nodegree + re74 + re75,
data = lalonde_split, ATT = 0
)
ps_2 <- CBPS(
treat ~ educ + race_black + race_hispan + race_white + nodegree + re74 + re75 + married + age,
data = lalonde_split, ATT = 0
)
head(data.frame(
ps1 = ps_1$fitted.values,
ps2 = ps_2$fitted.values
))
I've fixed weightit()
so that it ensures the matrix is full rank, yielding the same propensity scores every time.
Thank you so much @ngreifer for your quick answer!