tonydisera/gene.iobio

Min number or min percent of alt counts

Opened this issue · 3 comments

Have we ever thought of adding a filter for the minimum number (or percent) of alt alleles observed?

I see this sort of stuff all the time where there's very few alt counts and well below a nice 50/50 het split. This would allow user to "exclude variants with less than 5 alt counts" or "exclude variants with less than 20% alt counts"

image

Yes, @tonydisera and I discussed this a while back, but I'm not sure if it made it into an issue. I saw a whole bunch of deletions in the same exon for a few SFARI samples, that were all high impact frameshift variants. But they all had ~10% alt alleles, so clearly none of the deletions were real. I still haven't got access to the raw data to figure out what the actual sequence is in that gene, but apparently Rufus didn't call anything. But, clearly the variants as called aren't real, and filtering on the alt percentage is necessary.

We also have discussed (and this is probably an issue somewhere), including filter on strand bias, making sure that the variant is represented on both strands.

Cool cool. Now we have an issue for it. Also nice to see it is an artifact that occurs in other datasets as well. Not just a UCGD/Senteion thing, but also happens in SFARI data.

Definitely would be utility in this.

Yeah, this is common. This was the impetus for my most recent push for getting pileup available as well. I found these cases, and clearly there's something going on the data, but I don't have UNIX access to the BAM files, so my investigation petered out.