amplab/snap

Low % of properly paired reads

carlosmag opened this issue · 10 comments

Hi,

I am getting low % of properly paired reads in snap paired comparatively to bwa mem (24.19% vs 99.34%). These are values for trimmed input. Without trimming, I get 0% properly paired reads in samtools flagstat after snap paired mapping.

test genome
reference sequence

Still, considering that there is a low % of genetic diversity between Mycobacterium tuberculosis strains, I would expect the great majority of the reads to align without soft clipping.

For instance, for input without sequencing adapters, bbmap managed to map 80% of the reads (73% properly paired) in its perfect mode (no substitutions or indels allowed relatively to the reference) and using global alignment only. Parameters were local=f perfectmode=t fast=t. If you want, I can provide a link for trimmed fastq files.

Thanks for your time.

Thanks for your insights!
I might need to look for another genome to test.

Trimmed fastq files are here. You might have to right click them for regular download...

SNAP has a default hard min/max for paired-end spacing of 1 to 1000 bases

Maybe the problem is that the genome was sequenced on Illumina MiSeq and that reads actually overlap?

In the meantime, I passed bwa mem output to samclip --max 0 to remove clipped alignments and got ≃ 90% mapped reads, but only ≃ 25% properly paired with SNAP version 1.0beta.18. This is roughly the same value as with clipped SAM input.

Thanks for the detailed feedback!

I found the key parameter was spacing between paired-end reads. Setting -s 0 5000 increased % properly paired reads to ≃ 71% (≃ 97% mapped reads). -mrl 30 had negligible effect (≃2%).
Still, there are about a quarter less properly paired reads comparatively to bwa mem.

Will test latest dev version and consider it for large scale analyses.

Thanks for the algorithmic description!

I am not aware of inversions in M. tuberculosis.
Considering that some variant calling tools (such as Pilon) discard reads without 'properly paired flag', I was afraid of some sort of impact. I guess only further testing might tell the difference...