COMBINE-lab/pufferfish

Puffaligner doesn't map read pairs to different references

apcamargo opened this issue · 2 comments

Hi,

There are some applications where it's important to identify reads pairs where the reads map to different references. Even though Puffaligner map reads independently ("(…) we consider the chaining and chain filtering for each end of the read separately."), I couldn't find any pair consisting of mates that map to different references.

In comparison, Bowtie2 maps ≈ 1.6% of the read pairs to different references with the same inputs.

Hi @apcamargo ,

Thank you for your post. However, I am not sure if I understand the request clearly.
Would you mind explaining a little bit more?

Sure, @fataltes!

Here's Puffaligner's (using --bestStrata) samtools flagstat output:

214688504 + 0 in total (QC-passed reads + QC-failed reads)
50488220 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
125913389 + 0 mapped (58.65% : N/A)
164200284 + 0 paired in sequencing
82100142 + 0 read1
82100142 + 0 read2
83360444 + 0 properly paired (50.77% : N/A)
83360444 + 0 with itself and mate mapped
6101721 + 0 singletons (3.72% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Here's Bowtie2's (using -k 15):

241492571 + 0 in total (QC-passed reads + QC-failed reads)
77292287 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
159016623 + 0 mapped (65.85% : N/A)
164200284 + 0 paired in sequencing
82100142 + 0 read1
82100142 + 0 read2
74243714 + 0 properly paired (45.22% : N/A)
77436030 + 0 with itself and mate mapped
4288306 + 0 singletons (2.61% : N/A)
2489036 + 0 with mate mapped to a different chr
2027014 + 0 with mate mapped to a different chr (mapQ>=5)

Puffaligner's with mate mapped to a different chr is 0, meaning that there are no pairs with reads that mapped to different references.

Essentially, I'm interest in alignments where the 7th field is not =, for example:

HISEQ13:355:CBN0FANXX:7:1101:17319:1971	97	k147_2000503	17	38	150M	k147_584177	66	0	CGGCGGACTAAGGCTCTATAATTTCAATTTTTCACCAGACTAAGTAATCCATGAAGAAACTCATTGCAGCACTGGCTTCCAGTGTTCTGGTGATGTCCGCCGCCGTCGCCCAGACGCTGCCGGCGCCGACCATCGCCGCCAAATCGTGGC	=ABBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGFGG>GEGGGGGGDGFGGCGDGDGGGGG<DGGGGGGGBGGGGGGGGGGGGGGGGGGGGGGGGGG@	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:150	YT:Z:UP