FowlerLab/Enrich2

Error: Variants not being counted

iaincambeul opened this issue · 2 comments

Hello,

I think I am attempting a pretty simple query but I am having trouble getting any variants detected on a dataset from which I can clearly see the missense mutations in the fastq files.

I run fastq data like this:

@GWNJ-1013:157:GW2103221080th.Miseq:1:2101:10664:1047
CCTGGATATTAGCGAAAACGCGCTGAAAAAAGCGCGCGAAACCTTTAGCACCATGCCGAACAGCAGCTGCTTTAGCTTTGTGAAAGAAGATGTGTTTACCTGGCGCCCGGAACAGCCGTTTGATTTTATTTTTGATTATGTGTTTTTTTGCGCGATTGATCCGAAAATGCGCCCGGCGTGGGGCAAAGCGATGTATGAACTGCTGAAACCGGATGGCGAAGGAATTACCCTGATGTATCCGATTACCAACCATGAAGGCGGCCCGCCGTTTAGCGTGAGC
+
FFFFFFFFFFFFFFFFFFFFFFF,FF,FFFGHHHHHHHHHHHHHHGHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHGHHHGHHHHHHHHHHHHGHBHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHGHFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@GWNJ-1013:157:GW2103221080th.Miseq:1:2101:5674:1078
CCTGGATATTAGCGAAAACGCGCTGAAAAAAGCGCGCGAAACCTTTAGCACCATGCCGAACAGCAGCTGCTTTAGCTTTGTGAAAGAAGATGTGTTTACCTGGCGCCCGGAACAGCCGTTTGATTTTATTTTTGATTATGTGTTTTTTTGCGCGATTGATCCGAAAATGCGCCCGGCGTGGGGCAAAGCGATGTATGAACTGCTGAAACCGGATCACGAACTGATTACCCTGATGTATCCGATTACCAACCATGAAGGCGGCCCGCCGTTTAGCGTGAGC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHHHHHHHHHH?GHHHHHHHGHHGHHHHHHHHGHHHHHHHGHHHHHHHHHHHHHHHHHHHHAHHHGHBHHHHHHHHHHHHHHHGHHHHHH;HHGHHHHHGFHHHHFGFHHGHHHHHHHHHGHHHHHHHHHHHHHHHHHGHHHH?HGHHHGHHHHHHHHHHHHGHHGHHHHHH:HHHHHHHHHHHHHHGHHHHHHHFHGHHGHGGHHHGHHHHHGHHHHHHHFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

with the attached parameters (not in .txt format) and fail to detect any variants, even with a "basic" seqLib object and fairly permissive parameters. Any thoughts on how I can fix this?

ICl05_config.txt

It looks like your wild-type sequence is not the same length as your reads. That's going to cause your variants to get filtered out, since Enrich2 isn't designed with this use case in mind.

You can either modify your wild-type sequence so that it matches the position of the reads (assuming they all have the same start/end) or create a new FASTQ file where all the "reads" match the wild-type sequence in length and the original read is included at the correct relative position. The first option is much better if your data are amenable.

Perfect solution. Thank you for the quick response