ncn-foreigners/nonprobsvy

testing packages -- issues

BERENZ opened this issue · 0 comments

Code to generate data

library(nonprobsvy)
library(survey)
set.seed(123)
N <- 1e5
n <- 1000
x1 <- rnorm(N)
x2 <- rnorm(N)
x3 <- rnorm(N)
y <- 1 + x1 + x2 + rnorm(N)
rho <- plogis(exp(-1 + x2 + x3)/ (1 + exp(-1 + x2 + x3)))
br <- rbinom(N, size = 1, prob = rho)
random_s <- sample(1:N, size = n)
pop_df <- data.frame(x1,x2,x3,y,rho,br, random_s=1:N %in% random_s, random_p = n/N)
nonprob_df<-subset(pop_df, br == 1)
prob_df<-subset(pop_df, random_s == 1)
prob_df_svy <- svydesign(ids = ~1, probs = ~random_p, data = prob_df)

Mass Imputation

MI works but bootstrap is applied even if not specified. Moreover, bootstrap variance (1.682901e-05) significantly differs from the analytic version (0.01837757).

res_mi <- nonprobsvy::nonprob(outcome = y ~ x1 + x2, 
                              data = nonprob_df, 
                              svydesign = prob_df_svy, 
                              family_outcome = "gaussian", 
                              control_inference = controlInf(est_method = "likelihood",
                                                             var_method = "analytic"))

Output

$population_mean
[1] 1.001379

$variance
           [,1]
[1,] 0.01837757

$standard_error
          [,1]
[1,] 0.1355639

$CI
[1] 0.7356787 1.2670794

$beta
                 [,1]
(Intercept) 0.9991237
x1          1.0019048
x2          0.9890786

$boot_variance
[1] 1.682901e-05

I suggest:

  • variance and standard_error should be scalars
  • bootstrap should be conducted only when it is selected as a inference method in controlInf.
  • decompose standard_error into SE from nonprob and prob survey
  • compare results from survey (code below) with nonprobsvy::nonprob
m1 <- lm(y~x1 +x2, nonprob_df)
prob_df_svy <- update(prob_df_svy, y= predict(m1, prob_df_svy$variables))
svymean(~y, prob_df_svy)

    mean     SE
y 1.0014 0.0437

Propensity score

When I run the code

res_ps <- nonprobsvy::nonprob(selection = br  ~ x2 + x3, 
                              data = nonprob_df, 
                              svydesign = prob_df_svy, 
                              family_selection = "binomial",
                              control_inference = controlInf(est_method = "likelihood",
                                                             var_method = "analytic"))
res_ps

I got the following error

Error in rbind(X_nons, X_rand) :
number of columns of matrices must match (see arg 2)

DR

When I run the code

res_ps <- nonprobsvy::nonprob(selection = br  ~ x2 + x3, 
                              outcome = y ~ x1 + x2, 
                              data = nonprob_df, 
                              svydesign = prob_df_svy, 
                              family_outcome = "gaussian", 
                              family_selection = "binomial",
                              control_inference = controlInf(est_method = "likelihood",
                                                             var_method = "analytic"))

I got an error which may be connected with the situation when the selection variable (br) in nonprob_df dataset is equal to 1.

Error in if (is.null(newVal) && ((sum(f1) - sum(f0)) < slot(control, "tol"))) { :
missing value where TRUE/FALSE needed