Unexpected behavior when subsetting on decontaminated counts
jmodlis opened this issue · 1 comments
Hello,
Thank you for this great tool! I have noticed some unexpected behavior when running Seurat's subset
function in downstream analyses. It appears that I have to call subset
twice when utilizing the decontaminated counts from DecontX. I understand that the decontaminated counts will be less than the original counts, so it makes sense that the nCount_RNA
metadata column, for example would not reflect the decontaminated count, but I don't understand why calling subset twice would make it work, and where summary(s2l@meta.data$nCount_RNA)
is truly pulling it's information from. The cleanest solution I can come up with is to set sce$nCount_RNA <- NULL
and sce$nFeature_RNA <- NULL
prior to calling CreateSeuratObject
and this seems to make it recalculate the metrics and subset
will behave as expected downstream. See below for the unexpected behavior. This is more than likely a Seurat issue, but will affect users of your tool.
>sl <- CreateSeuratObject(counts=counts(sce),
meta.data=as.data.frame(colData(see)))
> sl <- subset(sl, subset=(nFeature_RNA > nFeature_RNA.ll & nFeature_RNA < nFeature_RNA.ul) & (nCount_RNA > nCount_RNA.ll &nCount_RNA < nCount_RNA.ul) & percent.mt < percent.mt.ul)
> summary(sl@meta.data$nCount_RNA)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2001 6203 10536 11039 14841 38710
> dim(sl@meta.data)
[1] 13385 14
>
> r <- round(decontXcounts(sce))
> s2l <- CreateSeuratObject(counts=r,
+ meta.data=as.data.frame(colData(sce)))
> s2l <- subset(s2l, subset=(nFeature_RNA > nFeature_RNA.ll & nFeature_RNA < nFeature_RNA.ul) & (nCount_RNA > nCount_RNA.ll &nCount_RNA < nCount_RNA.ul) & percent.mt < percent.mt.ul)
> summary(s2l@meta.data$nCount_RNA)
Min. 1st Qu. Median Mean 3rd Qu. Max.
4 5544 9783 10253 13948 38688
> dim(s2l@meta.data)
[1] 13385 14
> s2l <- subset(s2l, subset=(nFeature_RNA > nFeature_RNA.ll & nFeature_RNA < nFeature_RNA.ul) & (nCount_RNA > nCount_RNA.ll &nCount_RNA < nCount_RNA.ul) & percent.mt < percent.mt.ul)
> summary(s2l@meta.data$nCount_RNA)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2001 5970 10021 10524 14080 38688
> dim(s2l@meta.data)
[1] 12987 14
>
Hi @jmodlis, thanks for trying out our tool! I'm not totally sure. What is stored in the colData(sce)
? If variables such as nFeature_RNA
and nCount_RNA
are in the colData, then you may want to exclude them from the metadata when creating a new Seurat object so then Seurat can recalculate them.