StevenWingett/FastQ-Screen

Discrepancy in ambiguous alignments between default and bisulfite mode

FelixKrueger opened this issue · 1 comments

Currently, there seems to be a discrepancy in the counting of ambiguously mappable sequences between the default mode, and the --bisulfite mode. Here is an example of a human RRBS sample which was aligned with FastQ Screen in default mode:

non-bisulfite

It doesn't really produce uniquely aligned reads, which is fine as this is a bisulfite library. Of note, the sample contains ~35% of microsatellite sequences, a multimer of (TGGAA)n (see also here FelixKrueger/Bismark#265). This satellite repeat contamination, which is present in all animal species tested, is responsible for a generally low unique mapping efficiency.

When I ran FastQ Screen in --bisulfite mode, it does identify the sample as mainly human, but interestingly it does not show the ambiguously aligned micro-satellite sequences in all species:

fq_screen_plot

I suspect that the counting of ambiguous alignments in --bisulfite mode might be missing this contaminant. Maybe this has to do with the formatting of the read ID that is written out into the ambiguous.fastq file?

Added --score_min L,0,-0.6 as a Bismark/Bowtie2 mapping parameter to make FastQ Screen perform less stringent mapping, which is better for a QC tool and more consistent with non-bisulfite FastQ Screen mapping

Git commit: 517bee1