morris-lab/CellTagR

similarities of the cell tag barcodes within clones

Closed this issue · 2 comments

Hi,

Thanks for generating the powerful celltag tools! we are now analyzing our single-cell RNA data!

We only used V1 celltags so we expect to have only clonal analysis.

In the final results, we found that the clonally related cells did not have the exact same barcodes. For example, each of the three cells within a clone may have 2, 3, 4 barcodes, but they did share 2 same barcodes. Is this normal?

We went through the main steps in https://github.com/morris-lab/CellTagR, (note we did not do the following: filter transgene reads or CellTag Error Correction).

Ps: I think one important step might be the following one, but I cannot change the correlation.cutoff to 1, when I change the number from 0.7 to 0.99999999, the clones got smaller, but issues remained:

#Call clones
bam.test.obj <- CloneCalling(celltag.obj = bam.test.obj, correlation.cutoff=0.7)

I am now writing a short code to look cells with the exact same barcodes.

Best,

Li

The clone calling part of the pipeline is a little more elaborate than simple jaccard score filtering. It creates a cell-cell graph based on thresholded jaccard similarity and identifies clusters within this graph. It is not unusual to observe what you are seeing in your dataset, i.e. one clone containing 2, 3, 4 barcodes across clonally related cells. This helps account for missing data/limited information capture during single cell sequencing.

Correlation cutoff of 1 won't work since the thresholding code asserts a '>' instead of a '>=' and there would be no jaccard score > 1

test.df.sub <- test.df[which(test.df$x > correlation.cutoff), ]

If you wish to retain clonally related cells with exactly identical barcode signatures, for maximum stringency, you can skip the clone calling part and manually pick such cells from the metric filtered count matrix.

Thanks for your detailed explanation! It's a very good point to consider "This helps account for missing data/limited information capture during single cell sequencing."