PE library too large when reads have different length
jgarthur opened this issue · 3 comments
jgarthur commented
When two reads in a pair have different sequence lengths, the pair may be falsely put into the PE virtual library. I think the bug is in matepair.cpp: many statements of the form b<rp.r1.l
should in fact be b<rp.r2.l
.
Here is a read pair reproducing the issue:
@NS500193:36:H2WWLAFXX:1:11101:13299:12241 1:N:0:1
TCATGGCTTTCCGGTCTCAATAAAACAACCATAAGTGAAAATAATATCCAATAAACTTGATAACATACAACATTCAAGTCACCTAAGAGTTGTCAAGCATTTCCAGCATTTTAGAATTGCCCTTACCATTAATAGTATTATAGAAATGGCTAAGTCTCC
+
<A7AAFFFFFFFFFFFFAFFFFFAFFFFFFAF<FFFFFFFFFAF<FFFAAFFAAFFFFFAFAFFFFFFFFFFFFFFFF7F<7FFFAFAFA<FFFAFFFFFFFFFAFF.<FF.F.FFF)FFF<F7.FF.<F.FF)FA7F)FF)<.7FFFFFAFF7FFAF)
@NS500193:36:H2WWLAFXX:1:11101:13299:12241 2:N:0:1
AAATGCAAGTGAACTAAACAACAGCTAAGTCAGCCTTCAATAGATATTTGTTAAATAGGTAGTTGTCTCAGGGTACCATCTCTGCCCATCAGCAATAGACGCTTTTTATGGATTTTCCTTACCTTCTCCACCTTACTAATTAATTCTACATTTATCTC
+
.AAA<FFFFFFAFFFA.<FFFFAFFFFFFFF.FFFFFFAFFFFFFFAFFFFFF<7FFFFF7F<FFFFAF<FFAF.<F<FFFFF<F.FAFF<FFA<)FA)AAAFFFF7FFFAAF.FFFFFAFFFAFF)FA..A)AFFF.FFFA.FA.)FA)7FF<AA7<
jaredo commented
I'm very sorry for not replying earlier. I think this is fixed in the most recent commit (aac92fb).
best,
Jared
jgarthur commented
No worries! Just for completeness, lines 265 and 274 also have the b<rp.r1.l
comparison
jaredo commented
Thanks. Fixed these too.