Flaky test on dummyVars
MichaelChirico opened this issue · 0 comments
MichaelChirico commented
This test is flaky:
It fails whenever some entry from 1:15
is missing from sample.int(15, size = 100, replace = TRUE, prob = rep(1 / 15, 15))
.
That happens about (probably exactly? too lazy to do the math) 1.5% of the time:
mean(replicate(1e6, all(1:15 %in% sample.int(15, size = 100, replace = TRUE, prob = rep(1 / 15, 15)))))
# [1] [1] 0.984922
Observe:
# get an entry missing one of 1:15
repeat {
entry <- sample.int(15, size = 100, replace = TRUE, prob = rep(1 / 15, 15))
if (!all(1:15 %in% entry)) break
}
# now finish the test
data = data.frame(matrix(rep(as.factor(entry), 15), ncol = 15), stringsAsFactors = TRUE)
essai_dummyVars = caret::dummyVars(stats::as.formula(paste0("~ ", colnames(data), collapse = "+")), data)
exp_names_lvls <- apply(expand.grid(paste0("X",1:15), paste0(".",1:15)), 1, paste, collapse="")
res_names_lvls <- colnames(predict(essai_dummyVars, data))
all(exp_names_lvls %in% res_names_lvls)
# [1] FALSE