Use PCR duplicates for error correction?

Question

Use PCR duplicates for error correction?

Closed this issue a month ago · 2 comments

Is your feature related to a problem?

Some reads are inevitably thrown out due to lack of barcode, basecalling errors/below Qscore threshold, etc.

Describe the solution you'd like

If you have multiple reads with the same UMI, and some reads have low-quality bases in the cell barcode sequence, could you use the higher-quality barcode sequences from the PCR duplicates to correct the other barcode and retain the read? And could this likewise be used to correct internal nucleotide sequences?

Describe alternatives you've considered

Additional context

No response

Answer 1 · 2024-06-03T18:50:13.000Z

Hi @itslittman

The cell barcodes are assigned first, and these are then used to partition the reads, along with gene name, to reduce the search space for UMI correction and to reduce UMI collisions. I guess there could be a rescue step after UMI assignment where the rejected reads, due to no valid barcode being found, could be fished out by UMI/gene ID. It's and interesting idea.

You have a second question about correcting internal nucleotides. This could be much more easily done by generating consensus sequences for reads with the same barcode/UMI/gene and might be something that will be added to the workflow.

Answer 2 · 2024-07-18T13:48:35.000Z

Closing due to lack of response