chris-mcginnis-ucsf/MULTI-seq

focusing on 'multiplets'

Closed this issue · 3 comments

Hi @chris-mcginnis-ucsf

I am trying analyse a datasets generated similarly as MULTI-seq, but with fewer sample barcodes.

What's different with MULTI-seq, we are focusing on 'multiplets' specifically,
e.g. which cells have valida barcode1 and barcode2, which cells have valida barcode3 and barcode4 and barcode5

I am not an expert of statistics, and I wonder if we can:

  1. Remove "negative" cells by classification and the rest cells are what we want.
  2. OR, simply, I manually set a counts threshold, such as barcode 1>5 & barcode 2 >5 as bar1&bar2 doublets, and so on.

Which way do you think is mathmatically reasonable? And better idea is welcome.

Hey @MichaelPeibo -- sorry for not responding to your email!

Is there a reason why you can't just classify the cells as is described in the tutorial and perform downstream analyses on the cells called as multiplets by the algorithm?

Hey @MichaelPeibo -- sorry for not responding to your email!

Is there a reason why you can't just classify the cells as is described in the tutorial and perform downstream analyses on the cells called as multiplets by the algorithm?

Hi @chris-mcginnis-ucsf ,
This is exactly what I mean in :

  1. Remove "negative" cells by classification and the rest cells are what we want.

However, my goal is to find which two or more valid barcodes in a cell. I am not sure if deMULTIplex can provide a metric that I can find that.
And that's why I mention the second:

  1. OR, simply, I manually set a counts threshold, such as barcode 1>5 & barcode 2 >5 as bar1&bar2 doublets, and so on.

After deMULTIplex classification, I set another counts threshold.

Ah gotcha. Simple approach would be just to loop through the MULTI-seq barcode count matrix and identify the top two barcode UMIs for each classified doublet -- will not identify higher-order multiplets, but these are quite rare in mots datasets