Feature request: Trust the wild type call when resolving overlap read mismatches

Question

Feature request: Trust the wild type call when resolving overlap read mismatches

hezscha opened this issue 4 years ago · 2 comments

Hi,
concerning mismatch resolution between overlap read pairs, my lab was wondering if it's possible to implement resolving them by assuming that the base that matches the wild type is true in addition to using the base call qualities?
We have a site saturation mutagenesis library that should be mostly single amino acid variants, so there is strong support that wild type is the correct call if the disagreeing bases from forward and reverse both have the same quality.

Answer 1 · 2021-02-17T00:00:59.000Z

This is a good suggestion. I had originally planned to drop the overlapping paired end read mode in future projects, instead suggesting that users use FLASH2 or a similar program to calculate read overlaps, but this wild-type awareness could be a compelling reason to keep it around.

Answer 2 · 2021-12-09T23:22:09.000Z

@afrubin any thoughts on how Enrich2 will handle this moving forward?

I second @hezscha 's suggestion. I currently get a very high proportion of variant calls to "X" (~86%) with non-mutagenized (negative control) alignments. I see no reason not to resolve the variant if the base qualities are the same, especially if they are high. Unfortunately such high proportion of "X" calls makes the Overlap mode not useful for my case. Seems like the easiest work-around is to run in Basic mode with either R1 only or after merging pairs with PEAR, FLASH2, etc.