summary.fit.synds miscalculating variance estimators for simple synthesis when n is not equal to k
Closed this issue · 2 comments
I believe that the variance estimate, T_f
, for synthetic data in the case that population.inf = TRUE
and incomplete = FALSE
is currently being miscalculated in the case that k is not equal to n.
The line of code in question from the function summary.fit.synds is
## simple synthesis
} else {
if (object$proper == FALSE) Tf <- vars*(1 + n/k/m) else Tf <- vars*(1 + (n/k + 1)/m)
and I believe that it should read
## simple synthesis
} else {
if (object$proper == FALSE) Tf <- vars*(k/n + 1/m) else Tf <- vars*(k/n + (k/n + 1)/m)
so that it is consistent with the variance estimators that are define in section 2.2 of Practical data synthesis for large samples (Raab et al, 2016).
Thank you so much for looking at our code, we do appreciatre it. But I think in this instance you may be mistaken. The quantity vars in our code is not the average variance as estimated from the m synthetic data sets (\bar{v_M} in the paper) . In our code (\bar{v_M} is the component mvaravg of the syn object. The quantity vars is calculated on line 11 of the function by multiplying object$mvaravg by k/n. Vars represents the estimate of the variance of the parameters were they estimated from the original data.
Let me know if you agree. Thanks for taking an interest in our work. We are always pleased to hear how someone may be using synthpop, so feel free to email us to let us know.
Gillian Raab gillian.raab@ed.ac.uk
You are correct and I was mistaken. Thank you very much for your response.
Flynn