focusing on 'multiplets'

Question

focusing on 'multiplets'

Closed this issue 4 years ago · 3 comments

I am trying analyse a datasets generated similarly as MULTI-seq, but with fewer sample barcodes.

What's different with MULTI-seq, we are focusing on 'multiplets' specifically,
e.g. which cells have valida barcode1 and barcode2, which cells have valida barcode3 and barcode4 and barcode5

I am not an expert of statistics, and I wonder if we can:

Remove "negative" cells by classification and the rest cells are what we want.
OR, simply, I manually set a counts threshold, such as barcode 1>5 & barcode 2 >5 as bar1&bar2 doublets, and so on.

Which way do you think is mathmatically reasonable? And better idea is welcome.

Answer 1 · 2020-04-24T17:11:59.000Z

Hey @MichaelPeibo -- sorry for not responding to your email!

Is there a reason why you can't just classify the cells as is described in the tutorial and perform downstream analyses on the cells called as multiplets by the algorithm?

Answer 2 · 2020-04-25T00:45:49.000Z

Hey @MichaelPeibo -- sorry for not responding to your email!

Is there a reason why you can't just classify the cells as is described in the tutorial and perform downstream analyses on the cells called as multiplets by the algorithm?

Hi @chris-mcginnis-ucsf ,
This is exactly what I mean in :

Remove "negative" cells by classification and the rest cells are what we want.

However, my goal is to find which two or more valid barcodes in a cell. I am not sure if deMULTIplex can provide a metric that I can find that.
And that's why I mention the second:

OR, simply, I manually set a counts threshold, such as barcode 1>5 & barcode 2 >5 as bar1&bar2 doublets, and so on.

After deMULTIplex classification, I set another counts threshold.

Answer 3 · 2020-04-28T17:35:17.000Z

Ah gotcha. Simple approach would be just to loop through the MULTI-seq barcode count matrix and identify the top two barcode UMIs for each classified doublet -- will not identify higher-order multiplets, but these are quite rare in mots datasets