testing packages -- issues
BERENZ opened this issue · 0 comments
BERENZ commented
Code to generate data
library(nonprobsvy)
library(survey)
set.seed(123)
N <- 1e5
n <- 1000
x1 <- rnorm(N)
x2 <- rnorm(N)
x3 <- rnorm(N)
y <- 1 + x1 + x2 + rnorm(N)
rho <- plogis(exp(-1 + x2 + x3)/ (1 + exp(-1 + x2 + x3)))
br <- rbinom(N, size = 1, prob = rho)
random_s <- sample(1:N, size = n)
pop_df <- data.frame(x1,x2,x3,y,rho,br, random_s=1:N %in% random_s, random_p = n/N)
nonprob_df<-subset(pop_df, br == 1)
prob_df<-subset(pop_df, random_s == 1)
prob_df_svy <- svydesign(ids = ~1, probs = ~random_p, data = prob_df)
Mass Imputation
MI works but bootstrap is applied even if not specified. Moreover, bootstrap variance (1.682901e-05
) significantly differs from the analytic version (0.01837757
).
res_mi <- nonprobsvy::nonprob(outcome = y ~ x1 + x2,
data = nonprob_df,
svydesign = prob_df_svy,
family_outcome = "gaussian",
control_inference = controlInf(est_method = "likelihood",
var_method = "analytic"))
Output
$population_mean
[1] 1.001379
$variance
[,1]
[1,] 0.01837757
$standard_error
[,1]
[1,] 0.1355639
$CI
[1] 0.7356787 1.2670794
$beta
[,1]
(Intercept) 0.9991237
x1 1.0019048
x2 0.9890786
$boot_variance
[1] 1.682901e-05
I suggest:
variance
andstandard_error
should be scalarsbootstrap
should be conducted only when it is selected as a inference method incontrolInf
.- decompose
standard_error
into SE from nonprob and prob survey - compare results from
survey
(code below) withnonprobsvy::nonprob
m1 <- lm(y~x1 +x2, nonprob_df)
prob_df_svy <- update(prob_df_svy, y= predict(m1, prob_df_svy$variables))
svymean(~y, prob_df_svy)
mean SE
y 1.0014 0.0437
Propensity score
When I run the code
res_ps <- nonprobsvy::nonprob(selection = br ~ x2 + x3,
data = nonprob_df,
svydesign = prob_df_svy,
family_selection = "binomial",
control_inference = controlInf(est_method = "likelihood",
var_method = "analytic"))
res_ps
I got the following error
Error in rbind(X_nons, X_rand) :
number of columns of matrices must match (see arg 2)
DR
When I run the code
res_ps <- nonprobsvy::nonprob(selection = br ~ x2 + x3,
outcome = y ~ x1 + x2,
data = nonprob_df,
svydesign = prob_df_svy,
family_outcome = "gaussian",
family_selection = "binomial",
control_inference = controlInf(est_method = "likelihood",
var_method = "analytic"))
I got an error which may be connected with the situation when the selection variable (br
) in nonprob_df
dataset is equal to 1
.
Error in if (is.null(newVal) && ((sum(f1) - sum(f0)) < slot(control, "tol"))) { :
missing value where TRUE/FALSE needed