Illumina/paragraph

Format error in vcf line:

mrwangyz opened this issue · 3 comments

Thank you for developing this software, it is very helpful to me.
But I encountered a problem while using it. It seems that there is a problem with my file format. But based on looking at your source code, I found that this file was generated by grmpy. This confused me. After checking your source code, I still can't found problem. The following is my error message:
[E::idx_find_and_load] Could not retrieve index file for 'paragraph_inv/variants.vcf.gz'
2023-08-30 20:36:48,691 ERROR Traceback (most recent call last):
2023-08-30 20:36:48,691 ERROR File "/public2/wangyz/bin/paragraph-v2.4a-binary/lib/python3/grm/vcfgraph/vcfupdate.py", line 161, in update_vcf_from_grmpy record = header.new_record(contig=raw_record.chrom, start=raw_record.start, stop=raw_record.stop, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-08-30 20:36:48,692 ERROR File "pysam/libcbcf.pyx", line 2101, in pysam.libcbcf.VariantHeader.new_record
2023-08-30 20:36:48,692 ERROR File "pysam/libcbcf.pyx", line 3247, in pysam.libcbcf.VariantRecord.alleles.set
2023-08-30 20:36:48,692 ERROR ValueError: must set at least 2 alleles
2023-08-30 20:36:48,692 ERROR During handling of the above exception, another exception occurred:
2023-08-30 20:36:48,692 ERROR Traceback (most recent call last):
2023-08-30 20:36:48,699 ERROR File "/public2/wangyz/bin/paragraph-v2.4a-binary/bin/multigrmpy.py", line 340, in run vcfupdate.update_vcf_from_grmpy(vcf_input_path, grmpyOutput, result_vcf_path, sample_names)
2023-08-30 20:36:48,699 ERROR File "/public2/wangyz/bin/paragraph-v2.4a-binary/lib/python3/grm/vcfgraph/vcfupdate.py", line 164, in update_vcf_from_grmpy raise Exception("Format error in vcf line: " + str(raw_record))
2023-08-30 20:36:48,700 ERROR Exception: Format error in vcf line: chr1 4203 syri.INV.551237 . . . PASS SVLEN=2949;SVTYPE=INV;END=7152;GRMPY_ID=test_sort.vcf.gz@5b86c07c81908a94739dfe790e732ecf07909ff3fc7a02e1113cde7f9653acc5:1
Traceback (most recent call last):
File "/public2/wangyz/bin/paragraph-v2.4a-binary/lib/python3/grm/vcfgraph/vcfupdate.py", line 161, in update_vcf_from_grmpy
record = header.new_record(contig=raw_record.chrom, start=raw_record.start, stop=raw_record.stop,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pysam/libcbcf.pyx", line 2101, in pysam.libcbcf.VariantHeader.new_record
File "pysam/libcbcf.pyx", line 3247, in pysam.libcbcf.VariantRecord.alleles.set
ValueError: must set at least 2 alleles

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/public2/wangyz/bin/paragraph-v2.4a-binary/bin/multigrmpy.py", line 353, in
main()
File "/public2/wangyz/bin/paragraph-v2.4a-binary/bin/multigrmpy.py", line 349, in main
run(args)
File "/public2/wangyz/bin/paragraph-v2.4a-binary/bin/multigrmpy.py", line 340, in run
vcfupdate.update_vcf_from_grmpy(vcf_input_path, grmpyOutput, result_vcf_path, sample_names)
File "/public2/wangyz/bin/paragraph-v2.4a-binary/lib/python3/grm/vcfgraph/vcfupdate.py", line 164, in update_vcf_from_grmpy
raise Exception("Format error in vcf line: " + str(raw_record))
Exception: Format error in vcf line: chr1 4203 syri.INV.551237 . . . PASS SVLEN=2949;SVTYPE=INV;END=7152;GRMPY_ID=test_sort.vcf.gz@5b86c07c81908a94739dfe790e732ecf07909ff3fc7a02e1113cde7f9653acc5:1

Hi, I am wondering how did you fix this error? I am also getting a similar error.
image

Hi, I am wondering how did you fix this error? I am also getting a similar error. image

Hello, glad to help you.
I looked through the program that was throwing the error and determined where the problem was. There are many reasons why this error is thrown, but they are basically vcf format issues.
The author's program is written in python, so it is not very difficult to read the source code and find errors.

Thank you for your assistance and it looks like the vcf is incorrectly formatted. It was truly essential. I will read the program code and modify the vcf format.