"terminate called after throwing an instance of 'std::out_of_range'"
MariaNattestad opened this issue · 2 comments
Hi Tim
I got an error while running vcfdist on a DeepVariant output VCF.
Here is a reproducible example:
# Grab a chr20 subset for quick iteration:
bcftools view https://42basepairs.com/download/gs/deepvariant/case-study-outputs/1.5.0/PacBio/deepvariant.output.vcf.gz chr20 > deepvariant-small-sample.vcf
# Other inputs:
gs://deepvariant/case-study-testdata/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa
gs://deepvariant/case-study-testdata/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed
gs://deepvariant/case-study-testdata/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
DV="deepvariant-small-sample.vcf"
TRUTH="HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz"
TRUTH_REGIONS="HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed"
REF="GCA_000001405.15_GRCh38_no_alt_analysis_set.fa"
OUTPUT="vcfdist_on_deepvariant/"
mkdir $OUTPUT
vcfdist "${DV}" "${TRUTH}" "${REF}" --bed "${TRUTH_REGIONS}" --prefix "${OUTPUT}" --verbosity 1 2>&1 | tee vcfdist_on_deepvariant.log
Output:
[INFO vcfdist 13:40:31] Command: 'vcfdist deepvariant-small-sample.vcf /usr/local/google/home/marianattestad/commonly_used_deepvariant_files/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz /usr/local/google/home/marianattestad/commonly_used_deepvariant_files/GRCh38.no_alt_analysis_set.fa.gz --bed /usr/local/google/home/marianattestad/commonly_used_deepvariant_files/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed --prefix vcfdist_on_deepvariant/ --verbosity 1'
[INFO vcfdist 13:40:31]
[INFO vcfdist 13:40:31] Parsing VCF 'deepvariant-small-sample.vcf'
[WARN vcfdist 13:40:31] QUERY VCF variant overlap at chr20:1278060, skipping
[WARN vcfdist 13:40:31] QUERY VCF variant overlap at chr20:1288626, skipping
[WARN vcfdist 13:40:31] QUERY VCF variant overlap at chr20:1339612, skipping
[WARN vcfdist 13:40:31] QUERY VCF variant overlap at chr20:2248990, skipping
[WARN vcfdist 13:40:31] QUERY VCF variant overlap at chr20:3220062, skipping
...
[INFO vcfdist 13:40:31] Haplotype 2
[INFO vcfdist 13:40:31] REF 0
[INFO vcfdist 13:40:31] SNP 71148
[INFO vcfdist 13:40:31] INS 5920
[INFO vcfdist 13:40:31] DEL 5642
[INFO vcfdist 13:40:31] CPX 0
[INFO vcfdist 13:40:31]
[INFO vcfdist 13:40:31] Contigs:
[INFO vcfdist 13:40:31] 'chr20' hap 1: 29067 variants
[INFO vcfdist 13:40:31] 'chr20' hap 2: 82710 variants
[INFO vcfdist 13:40:31]
[INFO vcfdist 13:40:31] VCF Overview:
[INFO vcfdist 13:40:31] VARIANTS 311095
[INFO vcfdist 13:40:31] KEPT HAP1 29067
[INFO vcfdist 13:40:31] KEPT HAP2 82710
[INFO vcfdist 13:40:31]
terminate called after throwing an instance of 'std::out_of_range'
what(): unordered_map::at
I got the same error on the full VCF that was .gz with a .tbi index as well.
Thanks for raising an issue! What commit and version of vcfdist are you running? I just downloaded and tested the files you provided (subsetted to chr20
) and it runs fine on the most recent release (vcfdist v1.2.2
).
Side note: vcfdist
is designed to run on locally phased truth and query VCFs. It looks like neither of your VCFs are phased (as a result, reported precision and recall are around 75% when I run vcfdist
). A phased version of your truth VCF is available from the GIAB consortium here. For the query VCF, I recommend phasing with WhatsHap.
I'll plan to make a few changes based on this issue:
- add a warning if either input VCFs are unphased
- update README, making it more clear that phased VCFs are expected
- add
--branch v1.2.2
togit clone
instructions to ensure only more stable releases are cloned
Please let me know if you continue to have issues with a std::out_of_range()
error. At the moment, I'm unable to reproduce it.
Okay, the requirement of phasing the query VCF might be outside the scope for me, since that means I can't easily use vcfdist as a drop-in replacement for hap.py in my workflow -- I was hopeful since I wasn't sure how much the phasing mattered based on reading the abstract and skimming the paper.
But thanks for your help!
To answer your question:
git log | head
commit 6014c05f9b28dffc19cabf716925578986ed05a3
Author: Tim Dunn <me.timd1@gmail.com>
Date: Thu Apr 13 10:03:25 2023 -0400
...