skandlab/VarNet

VarNET outputs malformed VCFs

Closed this issue · 5 comments

I am using the latest VarNET docker as of this issue somatic calling and it outputs malformed vcf, which is irritating for the downstream anaysis. This is not acceptable for bcftools neither picards.

This is the header of the generated VCF.

image

####For bcftools

Writing to /tmp/bcftools.ZSi39Y
[W::bcf_hdr_parse] Could not parse header line: "#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE"
[E::bcf_hdr_parse] Could not parse the header, sample line not found
Could not read VCF/BCF headers from analysis_ready_bams/somatic/VarNET/A2-61/A2-61.vcf
Cleaning

#####For Picards

!/com/intel/gkl/native/libgkl_compression.so
[Sun Mar 03 08:01:52 PKT 2024] FixVcfHeader --INPUT analysis_ready_bams/somatic/VarNET/A2-61/A2-61.vcf --OUTPUT analysis_ready_bams/somatic/VarNET/A2-61/A2-61.fixed.vcf --CHECK_FIRST_N_RECORDS -1 --ENFORCE_SAME_SAMPLES true --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Sun Mar 03 08:01:52 PKT 2024] Executing as pmlab@pmlab-HP-Z8-G4-Workstation on Linux 6.5.0-21-generic amd64; OpenJDK 64-Bit Server VM 19.0.2+7-Ubuntu-0ubuntu322.04; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.5
[Sun Mar 03 08:01:52 PKT 2024] picard.vcf.FixVcfHeader done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=1065353216
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: there are not enough columns present in the header line: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE, for input source: file:///media/pmlab/PML-Drive1/OSCC/mock/Whole_exomes/HNSCC1/analysis_ready_bams/somatic/VarNET/A2-61/A2-61.vcf
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:264)
at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:103)
at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:128)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:81)
at htsjdk.variant.vcf.VCFFileReader.(VCFFileReader.java:145)
at picard.vcf.FixVcfHeader.doWork(FixVcfHeader.java:122)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:309)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: there are not enough columns present in the header line: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
at htsjdk.variant.vcf.AbstractVCFCodec.parseHeaderFromLines(AbstractVCFCodec.java:149)
at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:111)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:262)
... 9 more

#########################

Have you already solved this problem?

Sorry for the inconvenience @PML-research. This was fixed in e385c4f but the docker image was not updated. I have updated the docker image, so please download the latest docker image: docker pull kiranchari/varnet:latest.

Please delete the previously generated file analysis_ready_bams/somatic/VarNET/A2-61/A2-61.vcf and re-run predict.py to regenerate the vcf file. Please use the modified docker example command below as the location of the code in the docker image has changed (note -w /VarNet). I have also added a git pull command to download the latest code, if needed.

docker run -it --rm -v /data:/pikachu -w /VarNet kiranchari/varnet:latest /bin/bash -c "git pull; python predict.py --sample_name dream1 --normal_bam /pikachu/dream1_normal.bam --tumor_bam /pikachu/dream1_tumor.bam --processes 2 --output_dir /pikachu/varnet --reference /pikachu/GRCh37.fa"

Let me know if that resolves the issue.

@PML-research Sure, you can fix the issue by replacing the spaces in this line of the VCF header with tabs i.e.
change "CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE" to "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tSAMPLE".

This is done in e385c4f

There are no other changes in the update so if that resolved the issue for you, you can proceed to use the VCF.