Segmentation fault (core dumped)
Closed this issue · 14 comments
get results ,but return unsuccessful code.
LOG:
Estimation from OptimizeHeter:
Contaminating Sample PC1:-0.025539 PC2:-0.0565987
Intended Sample PC1:-0.0166733 PC2:-0.0297511
FREEMIX(Alpha):0.000110694
NOTICE - Success!
run.sh: line 7: 3722258 Segmentation fault (core dumped) VerifyBamID2 --Reference GRCh38_full_analysis_set_plus_decoy_hla.fa --BamFile test.bam --Output out_prefiex --NumThread 4 --SVDPrefix 1000g.phase3.100k.b38.vcf.gz.dat
MUGQICexitStatus:139
Could you provide more details about your environment?
Hi @Griffan, I'm also getting this error but in my case it's when I'm trying to create new resource files for vbid2.
My commandline is
verifybamid2 \
--RefVCF resources/CCDG_13607_B01_GRM_WGS_2019-02-19_all.recalibrated_variants.subsetted.vcf.gz \
--Reference references/GRCh38/GRCh38_full_analysis_set_plus_decoy_hla.dict &> resources/log/vbid_reference.log
I get the following error:
VerifyBamID2: A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.
Version:2.0.1
Copyright (c) 2009-2020 by Hyun Min Kang and Fan Zhang
This project is licensed under the terms of the MIT license.
The following parameters are available. Ones with "[]" are in effect:
Available Options
Input/Output Files : --BamFile [Empty],
--PileupFile [Empty],
--Reference [references/GRCh38/GRCh38_full_analysis_set_plus_decoy_hla.fa],
--SVDPrefix [Empty],
--Output [result]
Model Selection Options : --WithinAncestry,
--DisableSanityCheck, --NumPC [2],
--FixPC [Empty],
--FixAlpha [-1.0e+00],
--KnownAF [Empty], --NumThread [4],
--Seed [12345], --Epsilon [1.0e-08],
--OutputPileup, --Verbose
Construction of SVD Auxiliary Files : --RefVCF [resources/CCDG_13607_B01_GRM_WGS_2019-02-19_all.recalibrated_variants.subsetted.reheadered.vcf.gz]
Pileup Options : --min-BQ [13], --min-MQ [2],
--adjust-MQ [40], --max-depth [8000],
--no-orphans, --incl-flags [1040],
--excl-flags [1796]
Deprecated Options : --UDPath [Empty], --MeanPath [Empty],
--BedPath [Empty]
NOTICE - Specified --RefVCF reference panel VCF file, doing SVD on the fly...
NOTICE - This procedure will generate SVD matrices as [RefVCF path].UD and [RefVCF path].mu
NOTICE - You may specify --SVDPrefix [RefVCF path](or --UDPath [RefVCF path].UD and --MeanPath [RefVCF path].mu) in future use
<SNIP>/verifybamid2: line 33: 49452 Segmentation fault: 11 $DIR/VerifyBamID "$@"
Do you know what could be causing this error? The vcf is not empty, and the header matches that in the vcf.
You asked the previous poster information about their "environment". Could you clarify what information you are looking for?
Hi @yfarjoun , I drafted a PR to print out the crash site under branch "develop_branch_with_backward_cpp" and its PR is:#65
Would you mind to checkout this branch and post the backtrace info as a first step?
If that doesn't help, we may need to exchange a tiny test site so that I can debug locally.
Thanks for reporting this issue!
here's the stacktrace:
#8 Object "VerifyBamID", at 0x1028c6912, in main + 482
#7 Object "VerifyBamID", at 0x1028c314f, in execute(int, char**) + 3183
#6 Object "VerifyBamID", at 0x1028ce465, in SVDcalculator::ProcessRefVCF(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 69
#5 Object "VerifyBamID", at 0x1028cd1b2, in SVDcalculator::ReadVcf(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<std::__1::vector<char, std::__1::allocator<char> >, std::__1::allocator<std::__1::vector<char, std::__1::allocator<char> > > >&, int&, int&) + 4034
#4 Object "VerifyBamID", at 0x102923949, in String::AsInteger() const + 25
#3 Object "VerifyBamID", at 0x102923972, in String::AsInteger(long&) const + 18
#2 Object "libsystem_platform.dylib", at 0x7ff817d55c1c, in _sigtramp + 28
#1 Object "VerifyBamID", at 0x1028c74dd, in backward::SignalHandling::sig_handler(int, __siginfo*, void*) + 13
#0 Object "VerifyBamID", at 0x1028c7546, in backward::SignalHandling::handleSignal(int, __siginfo*, void*) + 70
[1] 67703 segmentation fault /Users/yossifarjoun/VerifyBamID/bin/VerifyBamID --RefVCF --Reference
Thanks, @yfarjoun! Could you also post a few lines of the Vcf File? it seems to be related to the GT or PL feilds.
I'm working with the 1000genomes file as input, so there's very little "secret" data here....
I added some fprintf lines and I have the format field that seems to throw it off.
I printed the position at each vcf iteration and the sample index (and ID) in the begining of the sample iteration.
the last position printed prior to the trace was 1228424 and the last sample index was 648 (HG01849)
so I ran zless FILE.vcf.gz | sed -n '/1228424/,$p' | head | cut -f 1-9,649
(adding 1 to the sample index since cut is 1-indexed) and I got:
chr1 1228424 . C T 681417 PASS AC=933;AF=0.187;AN=4992;BaseQRankSum=0.211;ClippingRankSum=0.081;DP=111710;ExcessHet=3.0318;FS=0.521;InbreedingCoeff=-0.0179;MLEAC=952;MLEAF=0.191;MQ=59.76;MQ0=0;MQRankSum=-0.062;NEGATIVE_TRAIN_SITE;POSITIVE_TRAIN_SITE;QD=16.18;ReadPosRankSum=0.54;SOR=0.726;VQSLOD=0.993;culprit=DP GT:AB:AD:DP:GQ:PGT:PID:PL 0/0:.:46,0:46:54:.:.:0,54,810
I think that both GT and PL look fine.... not sure what seems to be the problem.
here's the vcf up to the problematic line:
This is the problematic sample:"./.:.:42,0:42:.:.:.:.". I will try to fix it in the Debugging mode PR.
Having missing genotypes may result in inaccurate PCA estimates. I would advise remove variants with many missing genotypes, and fill-in the remaining missing genotypes with best-guess genotypes (or dosages) before calculating PCs, which should be the common practice.
Having missing genotypes may result in inaccurate PCA estimates. I would advise remove variants with many missing genotypes, and fill-in the remaining missing genotypes with best-guess genotypes (or dosages) before calculating PCs, which should be the common practice.
I have updated the PR to apply QC filters on each VCF record. #65
Thanks. when I looked for the errant sample using cut
I forgot to add 9 to the index to account for the fixed columns... 🤣
@hyunminkang thanks for the reminder. I'm looking for a way to drastically increase the sensitivity of vbid2 and for that I need to use specific SNPs. it's odd that the 1KG snps have many missing genotypes...
if it's only a few samples that are problematic and have many missing genotypes, I'll filter out these samples. If not, In lieu of removing the sites, I could also impute the missing genotypes, does that make sense?
not my issue to close, but my part in this issue is resolved.
not my issue to close, but my part in this issue is resolved.
The original issue should be different from the "—RefVCF" one. But the procedure to locate and report the specific crash scene is the same. Should anyone in future also encounter this error, please refer to the "Debugging Mode" section on README.
I will close this issue for now.