massive number of SNP/SNV/CNV count difference between lancet 1.0.7 v/s 1.1.0
complexgenome opened this issue · 6 comments
Hi Giuseppe,
I'm putting a pipeline together that uses lancet to call variants. Older version of pipeline uses lancet v1.0.7. I'm interested to put latest lancet version (v1.1.0) in the pipeline. I conducted a simulation with three different outputs: 1) genome wide, 2) CHR14 and CHR22
CHR22 being the smallest, CHR14 is chromosome of significance in the disease interested and genome-wide to validate what was observed on CHR14 and CHR22 could be replicated.
Kindly find below there numbers:
number of SNPs/SNV/CNV | lancet 1.0.7 | lancet 1.1.0 |
---|---|---|
genome wide | 537,006 | 365,811 |
CHR14 | 16,979 | 11,511 |
CHR22 | 11,997 | 7,664 |
Same input bam files were used in both lancet versions. I used same number of threads for all three outputs in both tool versions.
I counted number of lines to get these numbers (without # )
I see that there is ~35% of SNP/SNV/CNV count difference on CHR22 and 31% genome wide. I took 1.0.7 as reference whilst calculating these percentages.
I checked github release logs but nothing significant I could find that mentions about these changes.
Can you please guide?
Thanks for reporting this. Indeed that's a large difference in called variants. As part of the v1.1.0 release both htslib and bambtool were upgraded to v2.5.2 and 1.15.1 respectively. That's the only change that could "possibly" explain this (but I doubt it). Are you sure all the threads have completed successfully? Please share the exact same command that you have used for the two versions, and we'll try to reproduce on our side.
For Lancet 1.0.7
please see below:
/sc/arion/projects/tools/lancet-1.0.7/lancet --tumor tumor.ApplyBQSR.bam --normal normal.ApplyBQSR.bam --ref bwa/GRCh38.primary_assembly.genome.fa --bed Twist_Exome_Target_hg38.padded.interval_list.bed --num-threads 36 > total_lancet.vcf
For lancet-1.1.0
below command was used:
/hpc/minerva-centos7/lancet/1.1.0/lancet-1.1.0/lancet --tumor tumor.ApplyBQSR.bam --normal normal.ApplyBQSR.bam --ref bwa/GRCh38.primary_assembly.genome.fa --bed Twist_Exome_Target_hg38.padded.interval_list.bed --num-threads 36 > total_lancet_latest.vcf
Indeed there is a difference in ALL calls, but much less for the PASS only variants. @sariya can you please check and post the numbers for your PASS variants? In general, as part of the linked-reads support, we have made same changes to how the MD tag is parsed, something that could play a role here.
@gnarzisi Thanks for looking into this.
Below are the counts for PASS variants in different lancet versions.
lancet 1.0.7 | lancet 1.1.0 |
---|---|
Pass: 35 | Pass: 42 |
These counts are genome-wide.
@gnarzisi Hi Giuseppe,
Did you get a chance to look at it?
Yes, we tested internally and indeed there is a difference on ALL variants, but much less on the PASS variants. Our current focus is on finalizing the release of the new version of the tool, Lancet2. If you are happy with v1.0.7, and do not have the resources to validate v1.1.0, my suggestion is to stick with v1.0.7 and wait for the official release of Lancet2 (later this year) for the upgrade.