COMBINE-lab/simpleaf

[feature request] 10x chemistry autodetection

Opened this issue · 3 comments

A recurring feature request — provide automatic chemistry detection, at least in the case where we know that the input data is 10x. This would look something like passing -c auto10x and simpleaf would determine the chemistry present in the input. It’s OK, probably, to ignore 10x v1 (which anyway requires 3 input files), but most other single-cell RNA-seq chemistries should be detectable.

The basic idea would be to look at the combination of UMI and Barcode length and also the overlap of observed barcodes from a prefix of the reads and the different available permit lists.

CellRanger's implementation of chemistry auto-detect is public and available here (already in rust) - https://github.com/10XGenomics/cellranger/blob/a03981609639e55d3bef57811194c7197e8590b2/lib/rust/cr_lib/src/stages/detect_chemistry.rs#L337

While you're probably already aware of this, I'll share for posterity if nothing else

Thanks @AndrewSkelton, though given their license, we have to be careful here!

I just wanted to add to this, I would really appreciate if you could include the 10X ARC multiome chemistry in this auto barcode detection. The cellranger-atac workflow allows for an option to run the ARC chemistry but I'd like to use simpleaf for the scRNA quant side of things.

Thank you!