inconsistent error rates when using perf.mint.splsda and tune.mint.splsda

Question

inconsistent error rates when using perf.mint.splsda and tune.mint.splsda

aljabadi opened this issue 5 years ago · 0 comments

The performance of the tune.mint.splsda model at optimum hyperparameters:

data(stemcells)
X = stemcells$gene
Y = stemcells$celltype
study <- stemcells$study
tune.mint = tune.mint.splsda(X = X, Y = Y, study = study, ncomp = 2, test.keepX = seq(1, 100, 5),
                 dist = "max.dist", progressBar = FALSE)
plot(tune.mint)

Should be similar to that of perf.mint.splsda using the same hyperparameters:

mint.splsda.res = mint.splsda(X = X, Y = Y, study = study, ncomp = 2,
                              keepX = tune.mint$choice.keepX)

mint.splsda.res # lists useful functions that can be used with a MINT object

perf.mint = perf.mint.splsda(mint.splsda.res, progressBar = FALSE, dist = 'max.dist')

plot(perf.mint)

A possible solution is to ensure LOGOCV and perf.mint.splsda (and possibly other perf functions) call the same internal that does dev/test on studies and then make sure the outputs are identical as well.