vgteam/vg

vg call report missing allele

Han-Cao opened this issue · 2 comments

1. What were you trying to do?

Run vg call to genotype sample

2. What did you want to happen?

Output a valid VCF

3. What actually happened?

The alternative allele is missing

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

NA

5. What data and command can the vg dev team use to make the problem happen?

I use the following command to call the SV from a haplotype sampled graph:

vg call -a -A -k $pack -S GRCh38 -t 40 -s HG00438 -O -z ${gbz}

The variant covered by another deletion has a "." in alternative allele. Does it mean "*" in VCF specification?

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00438
chr1    3287345 >27889742>27889751      TTTACCGGGGCTGGAGCCGCCCCGCCACGCGGGCATCCAGGATTGCATTTACCGGGGCTGGAGCCGCCCCGCCACGCGGGCATCCAGGATTGCAT T       8.59884 PASS    AT=>27889742>27889743>27889745>27889746>27889748>27889750>27889751,>278897
42>27889751;DP=9;LV=0 GT:DP:AD:GL:GQ:GP:XD:MAD        0/0:9:8,1:-4.34941,-4.4999,-17.0462:1:-1.63343:20.6986:8
chr1    3287408 >27889745>27889748      CCGCCCC .       -25.3179        lowdepth        AT=>27889745>27889746>27889748;DP=1;LV=1;PS=>27889742>27889751  GT:DP:AD:GL:GQ:GP:XD:MAD        0/0:1:1:-2.53179:4:-1.42361:8.77778:1

Besides, vg deconstruct will not output this variant when I use the command: vg deconstruct -P GRCh38 -a -t 40 -O $gbz. In deconstruct's help, it says:

    -n, --nested             Write a nested VCF, including special tags. [experimental]
    -R, --star-allele        Use *-alleles to denote alleles that span but do not cross the site. Only works with -n

Does it mean only in nested mode will output this kind of variants? Should vg call follow the similar behavior when not using -n?

One important issue with the missing alternative allele is that vcfbub cannot parse it:

Error when include the alt="." record:

> bcftools view -Oz -o test.vcf.gz -r chr1:3287408 HG00438.hap48.vg_call.raw.vcf.gz
> vcfbub -l 0 -r 100000 -i test.vcf.gz | wc -l
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: RecordParseError(289)', src/main.rs:38:47
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: vcfbub::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
0

No error when filter by "ALT!="."':

> bcftools view -i 'ALT!="."' -Oz -o test.vcf.gz -r chr1:3287408 HG00438.hap48.vg_call.raw.vcf.gz
> vcfbub -l 0 -r 100000 -i test.vcf.gz | wc -l
288

6. What does running vg version say?

vg version v1.58.0 "Cartari"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Built by anovak@courtyard.gi.ucsc.edu```

I'm currently working to bring some of the recent vg deconstruct options (nested, star allele) to vg call. For now . in call just means there's not path through the site. It could be because of nested deletion, or the haplotype is completely missing. I'm hoping this will be ready within the next month or so.

vcfbub supports . alleles. So I'm not sure what would be causing the crash. I don't know if the full backtrace would help. Are you able to share test.vcf.gz?

Sure, please find the vcf at test.vcf.gz.txt.

Below is the error message I got when using vcfbub:

> vcfbub  -i test.vcf.gz

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: RecordParseError(290)', src/main.rs:38:47
stack backtrace:
   0:           0x48637c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::had12b8ebc27ab529
   1:           0x55bfde - core::fmt::write::hcef9c23bf27d039c
   2:           0x473111 - std::io::Write::write_fmt::h91d3c48a0b4c64c6
   3:           0x475f35 - std::panicking::default_hook::{{closure}}::hc107e5bad7c99d87
   4:           0x475be9 - std::panicking::default_hook::h38856b877e172286
   5:           0x47657f - std::panicking::rust_panic_with_hook::h66309baf5235212f
   6:           0x4866b7 - std::panicking::begin_panic_handler::{{closure}}::h3a147548aa082356
   7:           0x486494 - std::sys_common::backtrace::__rust_end_short_backtrace::hcc62583c733bef84
   8:           0x476092 - rust_begin_unwind
   9:           0x403a73 - core::panicking::panic_fmt::h8531284c14f462dc
  10:           0x403b53 - core::result::unwrap_failed::h6972a430d3981bcd
  11:           0x409fac - vcfbub::main::h79bc43e2af00842f
  12:           0x40a9d3 - std::sys_common::backtrace::__rust_begin_short_backtrace::hd0c74e7f0fa26780
  13:           0x40bf49 - std::rt::lang_start::{{closure}}::he04869133740a9b9
  14:           0x471751 - std::rt::lang_start_internal::h0b83bd2f4c5c32a0
  15:           0x40a9a2 - main
  16:           0x4a89b9 - __libc_start_main
  17:           0x4041aa - _start
  18:                0x0 - <unknown>