Question: Are redundant bases inside overlapping ends of paired reads counted once or twice towards consensus?

Question

Question: Are redundant bases inside overlapping ends of paired reads counted once or twice towards consensus?

JohnUrban opened this issue 2 years ago · 0 comments

Hi all,

Thanks for the great program. I've been using it for years now.

I may have known this at one point or another, but definitely don't know it now. The title says it all:

Are redundant bases inside overlapping ends of paired reads counted once or twice towards consensus?

Seems like one would only want to count each fragment once, even if it has two reads over a base.... this way certain fragments don't get weighted more heavily than others when trying to compute a consensus from a population of molecules.

I have 2 x 300 bp paired-end reads from MiSeq. My collaborators tried to do this on 600 bp fragments, but the overwhelming majority of fragments are ~300-350 bp, meaning the paired reads are nearly 100% redundant in most cases.

I suppose a related question would be:

What is the best approach in this scenario?
- (i) supply the paired-end reads BAM alignments?
- (ii) supply BAM of just one of the mates mapped as single end?
- (iii) map as pairs for any potential mapping specificity benefit*, extract the first read in each pair for updated BAM? *considering the high redundancy between mates, the benefit is likely minimal here.
- (iv) merge the reads into single-reads where possible (e.g. with fastp: https://github.com/OpenGene/fastp), and supply a mixture of paired reads and unpaired reads (merged)?

I will also post this to the pilon-users mailing list. If I get an answer from one, I will post the answer to the other as well to close both out.

Best,

John