constantAmateur/SoupX

Working with inaccurate cell calling data

RuiyuRayWang opened this issue · 1 comments

Due to the nature of my bio sample, CellRanger's cell calling is inaccurate and my filtered_feature_bc_matrix contains many empty droplets.

I was able to confirm this because in my umap there is a cluster showing very low number of gene numbers and very high level of mitochondrial transcripts.
Screen Shot 2022-06-22 at 13 12 33

Tuning the --force-cells parameter in cellranger is difficult because it's hard to find the exact threshold for calling a cell a cell. See my barcode rank plot below:
newplot2

My question is, can SoupX work with data like this?
Should I manually remove the empty droplet population before or after SoupX?

Thanks!
Ray

Broadly speaking it shouldn't matter much, as long as there are more real cells than miscalled empty cells. Optimally you'd probably want to remove them before SoupX, but it really shouldn't make much difference.

If you'd like to check, try running SoupX twice with and without these cells and verify that the estimated contamination fraction is similar.