single-cell-genetics/cellsnp-lite

UMI Collapsing

grasshoffm opened this issue · 5 comments

Hello to the cellsnp-lite team,

I wanted to ask how cellSNP-lite performs the UMI collapsing?

Best regards,

Martin

hxj5 commented

Hi Martin,

Thanks for the question. Firstly, cellsnp-lite relies on the UMI collaping output from upstream tools (e.g., CellRanger would correct sequencing errors within UMI sequences prior to UMI counting). Secondly, for SNP pileup in one UMI (i.e., a group of reads from the same source RNA molecular), cellsnp-lite currently uses the allele extracted from the first read as its consensus allele. The strategy is simple but practically effective, thanks to the technical advances with decreasing sequencing errors, while it can be optimized by considering the sequences and corresponding qualities from all reads.

Best,
Xianjie

Hi Xianjie,

Thanks for your swift reply. That explains everything.

Best,

Martin

@hxj5 Hi, I'm working on single-cell long-read data for which amplification of specific genes was performed. In that case, PCR errors can occur and there can be reads that are only partially amplified etc. If only the first read per transcript is considered, this could be suboptimal. We get 60 and more reads for a single transcript.
Do you have a recommendation how to handle this? Would it make sense to sort reads by alignment score (AS) tag first, e.g. by samtools sort -t AS (increasing order), then reverse order with tac?

hxj5 commented

Hi, thanks for the question. Cellsnp-lite was designed for short reads. It may not fit well if the PCR & sequencing error rates of your long-read data are much higher than short reads. As to sorting by AS, IMPO, it seems more reasonable to sort by sequencing qualities of individual alleles of target SNPs, which is actually performing UMI collapsing correction. I would suggest using pileup/genotyping tools tailored for long-read data, or short-read tools considering UMI collapsing (e.g., vartrix if I remember correctly).

Thanks a lot for your recommendation!