Illumina/paragraph

Pysam error

jjfarrell opened this issue · 6 comments

When running paragraph on a test vcf with just one variant row, this error is triggered. Any suggestions?

2020-03-05 19:38:20,888 ERROR    Traceback (most recent call last):
2020-03-05 19:38:20,889 ERROR      File "/share/pkg.7/paragraph/2.4a/install/bin/multigrmpy.py", line 340, in run    vcfupdate.update_vcf_from_grmpy(vcf_input_path, grmpyOutput, result_vcf_path, sample_names)
2020-03-05 19:38:20,889 ERROR      File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfupdate.py", line 218, in update_vcf_from_grmpy    record.samples[sample][k] = v
2020-03-05 19:38:20,889 ERROR      File "pysam/libcbcf.pyx", line 3455, in pysam.libcbcf.VariantRecordSample.__setitem__
2020-03-05 19:38:20,889 ERROR      File "pysam/libcbcf.pyx", line 859, in pysam.libcbcf.bcf_format_set_value
2020-03-05 19:38:20,890 ERROR      File "pysam/libcbcf.pyx", line 583, in pysam.libcbcf.bcf_check_values
2020-03-05 19:38:20,890 ERROR    TypeError: values expected to be 3-tuple, given len=1
Traceback (most recent call last):
  File "/share/pkg.7/paragraph/2.4a/install/bin/multigrmpy.py", line 353, in <module>
    main()
  File "/share/pkg.7/paragraph/2.4a/install/bin/multigrmpy.py", line 349, in main
    run(args)
  File "/share/pkg.7/paragraph/2.4a/install/bin/multigrmpy.py", line 340, in run
    vcfupdate.update_vcf_from_grmpy(vcf_input_path, grmpyOutput, result_vcf_path, sample_names)
  File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfupdate.py", line 218, in update_vcf_from_grmpy
    record.samples[sample][k] = v
  File "pysam/libcbcf.pyx", line 3455, in pysam.libcbcf.VariantRecordSample.__setitem__
  File "pysam/libcbcf.pyx", line 859, in pysam.libcbcf.bcf_format_set_value
  File "pysam/libcbcf.pyx", line 583, in pysam.libcbcf.bcf_check_values
TypeError: values expected to be 3-tuple, given len=1

One of your header lines indicates it expects Number=3 field but the vcf entry has one value.

Thanks for the quick response! I traced the error to the PL field (Number=G) when writing out the genotypes.vcf.gz It is not triggered when writing out the variants.vcf.gz. The single value is "." is triggering the error. That is the value that bcftools specifies when merging samples when the genotype is missing. So I think the header is fine. It looks like the missing value '.' is not being handled correctly when writing out.

GT:FT:GQ:PL:PR:SR:IF 0/0:.:.:.:.:.:0.782063 0/0:.:.:.:.:.:0.779178 0/0:.:.:.:.:.:0.78203

r```
w-r--r-- 1 farrell casa 87K Mar 5 20:23 genotypes.vcf.gz
-rw-r--r-- 1 farrell casa 9.2K Mar 5 20:23 grmpy.log
-rw-r--r-- 1 farrell casa 8.5K Mar 5 20:23 genotypes.json.gz
-rw-r--r-- 1 farrell casa 4.0K Mar 5 20:23 variants.json.gz
-rw-r--r-- 1 farrell casa 113K Mar 5 20:23 variants.vcf.gz
-rw-r--r-- 1 farrell casa 218 Mar 5 20:23 sample.txt

Also since I am trying to run paragraph on 5k crams with SVs compiled from various callers, it would be nice if there was an option so that the candidate vcf does not require the sample individual genotypes to run. The SV candidate vcf could then be distributed to other researchers to use for genotyping with paragraph.

If a candidate vcf is created without the sample genotypes, the error disappears. There is no GT='.' to copy over to the new vcf. One does not get the OLD GT and other info from the original genotyping stats in the new output with this candidate vcf with this change.

That's the default behavior.
When the genotyped sample is not in the input vcf, Paragraph will add a new sample column with GT.
When the genotyped sample is in the input vcf, Paragraph will output the genotype in GT field and move the original GT field to OLD_GT.
Do you still have the missingGT error?

Hi @jjfarrell ,

Thanks for the tip. I also encountered the same error due to improper handling of "." or "Null" value in my input VCF sample genotype field. After excluding VCF samples from the update step, everything is back to normal.

Thanks,
Wei