malucalle/selbal

how large of "n" necessary?

Opened this issue · 5 comments

Hello,

I was exploring using selbal this morning, and got an error, which much indicate I don't have enough samples in each of my two classes:

Error in (function (n, y, method = c("LOOCV", "CV", "MCCV", "bootstrap"), : 'n' is too small

I have 134 samples total, 90 are in the one class ("control") and 44 are in the other ("experiment"). I'm just curious for future reference how many samples one would need to have in order to use selbal?

Thanks- looks like a great tool!

Hi @binzo21 !

First of all thank you for using selbal. Regarding your question, the "n" yor propose is enough to use this package's functions. I suppose the problem should be another one.
How many rows and columns do you have in your data? Which is the exact command you have run before getting the error message?

Thank you,

Hi- thanks for getting back to me.

>Which is the exact command you have run before getting the error message?

CV.BAL.dic <- selbal.cv(x=x, y=y, n.fold = 5, n.iter=10, logit.acc = "AUC")

Error in (function (n, y, method = c("LOOCV", "CV", "MCCV", "bootstrap"), : 'n' is too small

The example in the vignette works just fine.

>How many rows and columns do you have in your data?

x is a data.frame with 134 rows (samples) an 67 columns (microbes)
y is a factor of length 134

Thanks! If there's anything else I can try, please let me know.
Lindsay

Hi again @binzo21!

I guess there is something we are missing, . . . Initially, your data has the enough length to run selbal.cv(); in fact, the dimension is similar to the data sets included in the package. Have you tried the examples to see that they work fine?

Just to discard possible issues with the data, please tell me the result of running the following lines:

class(y)

str(x) (Check here that all the columns are numeric num

Hi @UVic-omics,

I am having the same error.
#Error in (function (n, y, method = c("LOOCV", "CV", "MCCV", "bootstrap"), : 'n' is too small

I could run Vignette example just fine. My data has 2 samples (rows), 308 otus (columns).
class(y) is factor.
str(x) returns "int" for all 308 otus.

Hi @ayazagan!

Which is the goal of your analysis? Two samples is too small to run any of the cross validated methods. If you want to see differences between only two samples, just run selba()l.
Nevertheless, be carefull with your results, with two samples you will get a perfect classification.

Please, ask any other question you have.

Best regards,