chris-mcginnis-ucsf/MULTI-seq

cannot find barcode threshold

MichaelPeibo opened this issue · 2 comments

Hi @chris-mcginnis-ucsf
I came to an warning that 'cannot find threshold' for certain barcode.
Any suggestion what is reason behind it and how to fix it?

Actually, my dataset is a bit different with multi-seq sample data, I have only five barcodes, and their expression pattern(expression abundance) not like MULTI-seq sample barcode. Ours is much similar to a gene expression.
Any suggestion how to determine valid barcode for this?

Hi @MichaelPeibo ,

Can you show me the barcode space for your data? Also histograms of log-normalized counts for each barcode?

I find that removing uninformative cells can help in barcode identification. For example, you can remove cells with fewer than X total barcode UMIs. You could also compute each cell's signal-to-noise (e.g., ratio of the top two most abundant barcodes for each cell) and remove cells with SNR values <1.1.

Chris

Hi @chris-mcginnis-ucsf
Here is density plot of one type of barcode, others are quite similar.
image
There are many low counts of certain barcode in all cells.

I figured one way to determine the valid barcoded cell, I do not know if that is reasonable, which mainly depends on mannual chosen threshold of barcode-specific empirical cumulative distribution.

We are preparing manuscripts of this project. So I am sending you details via email.
It would be of great help if you can give some critical advice about that.😀