aertslab/cisTopic

distinguishing of broken cells from true low depth cells

wangmeijiao opened this issue · 3 comments

Hi all,
cisTopic is a novel tool for single cell ATAC data analyzing, which applies the latent Dirichlet allocation (LDA) algorithm to reduce dimensions. It was reported that cisTopic can specifically handle cells at low-depth (around 3k per cell). This is very useful because the technique difficulties of single cell experiment and sequencing. Here comes my question: If I understand right, cells from the simulated data in the cisTopic paper (cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data Nature Method 2019) have a generally low-depth character (sup.figure 1,2 and 4). But what if in a real dataset both high depth and low depth cells are captured and sequenced, does cisTopic will treat them properly without bias for depth level?
The rationale of my question comes from the concern about distinguishing of broken cells from true low depth cells. Some tools (like snapATAC and cellranger-atac) will normalize all cells to their depth, some may encourage users to prefilter low-depth cells out. What do you suggest to treat scATAC datasets with both high and low depth cells within?

ADD: Sequencing depth is an important factor to consider in dimension reduction, clustering and inferring developmental trajectory. It is hard to balance with filtering low quality cells and keeping possible low-depth cells with some biological meanings. I posted my question here in case that others maybe also interested with this question. Please also refer to this thread (stuart-lab/signac#122). cisTopic has a good advantage to treat low-depth cells, which is quite important. I just want to make sure if cisTopic can distinguish low-depth and high-depth cells in one experiment without depth bias.

Hi @wangmeijiao !

We have never had issues with depth bias with cisTopic. The number of assignments per cell is proportional to the depth, and are normalised when calculating topic contributions (by dividing by the total number of assignments per cell ('Probability') or using a Z-score).

Hope this answers the question!

C

Opps! thanks for your response and ... two years later (: . We just filtered out the low quality cells. It seems that cisTopic indeed takes sequence depth into consideration. Thanks again to describe the details inside. @ cbravo93