nygenome/lancet

Calling from secondary alignments

gilhornung opened this issue · 7 comments

Hi Giuseppe,

I noticed several cases where lancet reported a variant, and based on the IGV image (see example below), all the variants appear on secondary alignments. Is there a way to ask lancet not to use secondary alignments to call variants?

Thank you,

Gil

bam file with the problematic variant:
https://owncloud.incpm.weizmann.ac.il/index.php/s/HltGOlzT6wegHtp/download?path=%2F&files=lamc2.bam

IGV image:
lamc2_igv

Currently there is no option to not use secondary alignments. I may include an option for this in the future if there is enough evidence that variants called from these reads are usually false-positives.

In your specific example, however, it looks like these are supplementary alignments that have been marked as secondary (using the -M option in BWA). This means that, instead of the read mapping to multiple places in the genome (e.g., due to repeats), this is actually a split alignment where a portion of the read maps to one place while the rest of the bases map to another place. As such, those reads should be used for variant calling in general. As indication of the fact that these alignments are supplementary, the alignments are shorter (63bp), contain hard clippings, and their supplementary alignment follow the same pattern.

Also note that, although lancet does not filter secondary alignments based on their alignment flag, reads that are highly likely to be multi-mapped are filtered using a combination of the AS and XS tags.

Thanks for the reponse.

I used the -M option in BWA because the bwa manual says this resolves issues with Picard MarkDuplicates. Maybe this issue has been resolved and no one updated the manual...

I looked at the issue of secondary alignments further, and based on this GATK tutorial secondary/supplementary alignments are ignore in the MarkDuplicates process. So maybe it is still worthwhile to disregard them because a small number can be PCR duplicated? From the IGV image it seems to have happened here. Your call.

Both tools only consider primary mappings, even if mapped to different contigs, and ignore secondary/supplementary alignments (256 flag and 2048 flag) altogether

What is your rationale to assume that these 4 (supplementary alignment) reads are PCR duplicates? Also, as I mentioned before, supplementary alignments can represent true mutations, especially structural variations.

For (real) secondary alignments, it may be worth exploring the impact of using or discarding them for variant calling. If it becomes a serious issue, I may add an option to allow skipping these alignments. But for now I feel that this has low priority.

The latest release provides a new command line parameter to request only primary alignments to be used for variant calling. Cheers!

Hello,

Does lancet ignore duplicate marked reads by default?

Thanks!

Yes. As long as their a correctly marked in the BAM file.