Total SD, take 2
Opened this issue · 1 comments
In #111, @chadhazlett proposed being able to specify a total standard deviation / variance for draw_normal_icc. I implemented this in early May -- in this implementation, the user supplies an ICC and a total_sd; we generate the ICC variable stochastically, fixing one of the sds as 1 and deriving the other from the ICC; then, the total_sd variable is used to rescale the variable at the end.
An advantage of this approach, which is the one I think Chad suggested, is that it ensures exact total standard deviation 100% of the time.
A disadvantage of this approach, as Neal mentioned, is that the rescaling will possibly distort the between group differences. Neal, instead, proposed noting that total = within + between. So basically, rather than a post-hoc scaling, you'd specify any two variables and get the other two. I'd have to work out the math, but this would basically leave us with two constraints; the ICC and one of within/between constrains the other of within/between and total; the total and within/between constrains the other of within/between and ICC. I would have to think a bit about what possible combinations of arguments we would allow.
I agree the solution I came up with is imperfect because the other three variables are targets for the stochasticity to hit, while total_sd is an exact mechanical consequence of the scaling.
Issuing this so that there can be some discussion.
I think rescaling is generally not what people would expect, eg people don't usually expect
sd(rnorm(100)) == 1
exactly - there's sampling variability there, and I think that when ICC = 0 we should be the same as rnorm.