[feature request] 10x chemistry autodetection
Opened this issue · 3 comments
A recurring feature request — provide automatic chemistry detection, at least in the case where we know that the input data is 10x. This would look something like passing -c auto10x
and simpleaf
would determine the chemistry present in the input. It’s OK, probably, to ignore 10x v1 (which anyway requires 3 input files), but most other single-cell RNA-seq chemistries should be detectable.
The basic idea would be to look at the combination of UMI and Barcode length and also the overlap of observed barcodes from a prefix of the reads and the different available permit lists.
CellRanger's implementation of chemistry auto-detect is public and available here (already in rust) - https://github.com/10XGenomics/cellranger/blob/a03981609639e55d3bef57811194c7197e8590b2/lib/rust/cr_lib/src/stages/detect_chemistry.rs#L337
While you're probably already aware of this, I'll share for posterity if nothing else
Thanks @AndrewSkelton, though given their license, we have to be careful here!
I just wanted to add to this, I would really appreciate if you could include the 10X ARC multiome chemistry in this auto barcode detection. The cellranger-atac workflow allows for an option to run the ARC chemistry but I'd like to use simpleaf for the scRNA quant side of things.
Thank you!