Unclear what the exception is referring to when htsjdk is used to validate a NIST .vcf file
namra1 opened this issue · 3 comments
Before you submit
Description of the issue:
htsjdk errors when trying to validate a NIST .vcf file. The .vcf file is generated by NIST so I would think it has a valid format. The error message is unclear to me.
Your environment:
- version of htsjdk
** HTSJDK Version: 2.24.1 - version of java
** v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 - which OS
** v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
Steps to reproduce
If you're reporting a bug, tell us how to reproduce this issue. If possible, include a short code snippet or attach test data to demonstrate the problem.
gatk ValidateVariants -R hg19.fa -V NIST_NA12878_calls_in_PLDv2.vcf
I also get an error attempting to index the .vcf file in igv-tools in version 2.10.3.
NIST_NA12878_calls_in_PLDv2.vcf.gz
NIST_NA12878_calls_in_PLDv2.vcf.gz
Expected behaviour
Tell us what should happen.
It should validate the vcf file
Actual behaviour
gatk ValidateVariants -R hg19.fa -V NIST_NA12878_calls_in_PLDv2.vcf
Using GATK jar /gatk/gatk-package-4.2.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.2.2.0-local.jar ValidateVariants -R hg19.fa -V NIST_NA12878_calls_in_PLDv2.vcf
22:27:12.103 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Aug 28, 2021 10:27:12 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
22:27:12.247 INFO ValidateVariants - ------------------------------------------------------------
22:27:12.247 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.2.2.0
22:27:12.247 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
22:27:12.248 INFO ValidateVariants - Executing as root@b5d24391e2e6 on Linux v5.8.0-59-generic amd64
22:27:12.248 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
22:27:12.248 INFO ValidateVariants - Start Date/Time: August 28, 2021 10:27:12 PM GMT
22:27:12.248 INFO ValidateVariants - ------------------------------------------------------------
22:27:12.248 INFO ValidateVariants - ------------------------------------------------------------
22:27:12.249 INFO ValidateVariants - HTSJDK Version: 2.24.1
22:27:12.249 INFO ValidateVariants - Picard Version: 2.25.4
22:27:12.249 INFO ValidateVariants - Built for Spark Version: 2.4.5
22:27:12.249 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
22:27:12.249 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
22:27:12.249 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
22:27:12.249 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
22:27:12.249 INFO ValidateVariants - Deflater: IntelDeflater
22:27:12.249 INFO ValidateVariants - Inflater: IntelInflater
22:27:12.250 INFO ValidateVariants - GCS max retries/reopens: 20
22:27:12.250 INFO ValidateVariants - Requester pays: disabled
22:27:12.250 INFO ValidateVariants - Initializing engine
22:27:12.659 INFO FeatureManager - Using codec VCFCodec to read file file:///test/NIST_NA12878_calls_in_PLDv2.vcf
22:27:12.666 INFO ValidateVariants - Shutting down engine
[August 28, 2021 10:27:12 PM GMT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=1182793728
org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path NIST_NA12878_calls_in_PLDv2.vcf
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:436)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:377)
at org.broadinstitute.hellbender.engine.FeatureDataSource.(FeatureDataSource.java:319)
at org.broadinstitute.hellbender.engine.FeatureDataSource.(FeatureDataSource.java:291)
at org.broadinstitute.hellbender.engine.VariantWalker.initializeDrivingVariants(VariantWalker.java:58)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726)
at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:45)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Flag is an unsupported type for this kind of field, for input source: NIST_NA12878_calls_in_PLDv2.vcf
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:263)
at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:102)
at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:127)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:433)
... 13 more
Caused by: java.lang.IllegalArgumentException: Flag is an unsupported type for this kind of field
at htsjdk.variant.vcf.VCFCompoundHeaderLine.(VCFCompoundHeaderLine.java:243)
at htsjdk.variant.vcf.VCFFormatHeaderLine.(VCFFormatHeaderLine.java:50)
at htsjdk.variant.vcf.AbstractVCFCodec.parseHeaderFromLines(AbstractVCFCodec.java:198)
at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:111)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:26
@namra1 You're right that the error message isn't as helpful as it could be, but the file you provided does indeed appear to have an invalid header line:
##FORMAT=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
Not sure how that got generated, but the VCF spec does prohibit Type=Flag
for a ##FORMAT line. Flag is only applicable to ##INFO fields.
On second thought, reopening since we should fix the unhelpful error message to include some context.