Discrepancy in ambiguous alignments between default and bisulfite mode
FelixKrueger opened this issue · 1 comments
Currently, there seems to be a discrepancy in the counting of ambiguously mappable sequences between the default mode, and the --bisulfite
mode. Here is an example of a human RRBS sample which was aligned with FastQ Screen in default mode:
It doesn't really produce uniquely aligned reads, which is fine as this is a bisulfite library. Of note, the sample contains ~35% of microsatellite sequences, a multimer of (TGGAA)n
(see also here FelixKrueger/Bismark#265). This satellite repeat contamination, which is present in all animal species tested, is responsible for a generally low unique mapping efficiency.
When I ran FastQ Screen in --bisulfite
mode, it does identify the sample as mainly human, but interestingly it does not show the ambiguously aligned micro-satellite sequences in all species:
I suspect that the counting of ambiguous alignments in --bisulfite
mode might be missing this contaminant. Maybe this has to do with the formatting of the read ID that is written out into the ambiguous.fastq
file?
Added --score_min L,0,-0.6 as a Bismark/Bowtie2 mapping parameter to make FastQ Screen perform less stringent mapping, which is better for a QC tool and more consistent with non-bisulfite FastQ Screen mapping
Git commit: 517bee1