drisso/zinbwave

What's the best way to adjust the sample batch covariable in the differential expression?

Closed this issue · 3 comments

Hi, drisso, I have just moved on the the differential expression of single-cell RNA-seq, I used to apply DESeq2 to analysis bulk RNA-seq dataset, where we can correct for the batch effects by including them in the design matrix of DESeq2. further, by combining zinbwave with DESeq2, we can analysis single-cell RNA-seq data, and both of them can adjust the sample level batch effect.

Let's hypothesize a dataset (names zinb) in a class of
SummarizedExperiment with a sample-level covariable (names batch) and a variable (names condition). Following, we'll want to extract the differential expression genes among the condition variable and to correct the batch effect; we'll take dataset zinb as the example data below, there will be three choice to do that with zinbwave and deseq2

First, Only correct for the batch effects with zinbwave:

zinb <- zinbwave(zinb, X = "~batch", K=0, observationalWeights=TRUE,
                   BPPARAM=SerialParam(), epsilon=1e12)
dds <- DESeqDataSet(zinb, design=~condition)
sizeFactors(dds) <- sizeFactors(scran::computeSumFactors(dds))
dds <- DESeq(dds, test="LRT", reduced=~1,
               minmu=1e-6, minRep=Inf)

the second, Only correct with DESeq2:

zinb <- zinbwave(zinb, K=0, observationalWeights=TRUE,
                   BPPARAM=SerialParam(), epsilon=1e12)
dds <- DESeqDataSet(zinb, design=~batch + condition)
sizeFactors(dds) <- sizeFactors(scran::computeSumFactors(dds))
dds <- DESeq(dds, test="LRT", reduced=~batch,
               minmu=1e-6, minRep=Inf)

the third, combine zinbwave with deseq2:

zinb <- zinbwave(zinb, X = "~batch", K=0, observationalWeights=TRUE,
                   BPPARAM=SerialParam(), epsilon=1e12)
dds <- DESeqDataSet(zinb, design=~batch + condition)
sizeFactors(dds) <- sizeFactors(scran::computeSumFactors(dds))
dds <- DESeq(dds, test="LRT", reduced=~batch,
               minmu=1e-6, minRep=Inf)

What's the best way to adjust the sample batch covariable in the differential expression of scRNA-seq data?

The short answer is that each method tries to correct the batch effect in a different way and depending on the data one could be preferable to the others. As usual, the best way to proceed is to "look at the data" through exploratory data analysis and proceed based on the specific situation that you see.

More specifically, since you are using zinbwave to estimate observational weights, the only effect that the model will have on differential expression is to downweight the influence of zero counts on DE analysis. This will remove the batch effect only if each batch has a different dropout rate and that's the only difference among the batches.

On the other hand, adding the batch covariate to the DESeq2 model will act on the mean of the distribution, hence correcting batch effects on the mean.

One could think that your third solution is the safest as it will act on both potential effects, but again looking at the e.g., the average expression by batch and the proportion of zeros by batch could tell you more about your batch effects and how to act on them.

PS. In the future, this type of questions is more suited for the Bioconductor support forum (support.bioconductor.org) than for github issues.

Hope this helps.
Davide

Thank u a lot!!!!I'm sorry for the incorrect location to push the questions

Can I have this understandings? if the variable can influence the zero counts, we should put it into zinbwave, so maybe sometimes, we may create code like this(put the condition variable into the parameter X in the zinbwave):

zinb <- zinbwave(zinb, X = "~batch + condition", K=0, observationalWeights=TRUE,
                   BPPARAM=SerialParam(), epsilon=1e12)
dds <- DESeqDataSet(zinb, design=~batch + condition)
sizeFactors(dds) <- sizeFactors(scran::computeSumFactors(dds))
dds <- DESeq(dds, test="LRT", reduced=~batch,
               minmu=1e-6, minRep=Inf)

Thanks for ur help

Yes, I think this code looks good.