drisso/zinbwave

zinbwave design matrix

Closed this issue · 3 comments

Hi,

I want to use zinbwave for my single cell rnaseq analysis. My plan is to use zinbwave followed by edgeR. I would like to clarify how the design matrix should be passed into zinbwave.

For my experiment, I have single cells from 2 conditions in which I would like to compare. However, I would like to regress for number of detected genes and batch (the single cells were sequenced in multiple batches). Below is my command for zinbwave

sezinbwave <- zinbwave(se, X="~NODG + batch + condition", residuals = TRUE, normalizedValues = TRUE, epsilon=1e12)

I read in #33 that the design matrix has to be the same for the zinbwave and edgeR. Is that correct? or should I just pass in batch and number of detected genes? This is what is being shown in the vignette.

sezinbwave <- zinbwave(se, X="~NODG + batch", residuals = TRUE, normalizedValues = TRUE, epsilon=1e12)

Thank you very much.

Hi @JoannaTan ,

the design matrix does not have to be the same as edgeR's, but it makes sense for it to be, since you will be using weights in a model with that design, there is no reason why not using the same design in the estimation of the weights.

On a different note, are you sure that the inclusion of the number of detected genes is needed in your analysis? By accounting for zero inflation via the weights I would imagine that you do not need to regress out the number of expressed genes. Have you tried a model without that covariate?

Hi @drisso,

Thank you for the reply.

I have tried the model without the number of detected genes. By plotting a correlation plot between the first 5 PCs and the different confounding factors, I noted that the effect contributed by the number of detected genes were removed. Thank you for the suggestion.

I have another question, prior to running zinbwave, I did some pre-filtering of genes. Is there a minimum number of genes that I need to pass into zinbwave?

There is no minimum number of genes, but n * J needs to be large enough to estimate all the parameters.