samtools/htsjdk

Unclear what the exception is referring to when htsjdk is used to validate a NIST .vcf file

namra1 opened this issue · 3 comments

Before you submit

Description of the issue:

htsjdk errors when trying to validate a NIST .vcf file. The .vcf file is generated by NIST so I would think it has a valid format. The error message is unclear to me.

Your environment:

  • version of htsjdk
    ** HTSJDK Version: 2.24.1
  • version of java
    ** v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
  • which OS
    ** v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08

Steps to reproduce

If you're reporting a bug, tell us how to reproduce this issue. If possible, include a short code snippet or attach test data to demonstrate the problem.

gatk ValidateVariants -R hg19.fa -V NIST_NA12878_calls_in_PLDv2.vcf

I also get an error attempting to index the .vcf file in igv-tools in version 2.10.3.
NIST_NA12878_calls_in_PLDv2.vcf.gz
NIST_NA12878_calls_in_PLDv2.vcf.gz

Expected behaviour

Tell us what should happen.
It should validate the vcf file

Actual behaviour

gatk ValidateVariants -R hg19.fa -V NIST_NA12878_calls_in_PLDv2.vcf
Using GATK jar /gatk/gatk-package-4.2.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.2.2.0-local.jar ValidateVariants -R hg19.fa -V NIST_NA12878_calls_in_PLDv2.vcf
22:27:12.103 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Aug 28, 2021 10:27:12 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
22:27:12.247 INFO ValidateVariants - ------------------------------------------------------------
22:27:12.247 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.2.2.0
22:27:12.247 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
22:27:12.248 INFO ValidateVariants - Executing as root@b5d24391e2e6 on Linux v5.8.0-59-generic amd64
22:27:12.248 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
22:27:12.248 INFO ValidateVariants - Start Date/Time: August 28, 2021 10:27:12 PM GMT
22:27:12.248 INFO ValidateVariants - ------------------------------------------------------------
22:27:12.248 INFO ValidateVariants - ------------------------------------------------------------
22:27:12.249 INFO ValidateVariants - HTSJDK Version: 2.24.1
22:27:12.249 INFO ValidateVariants - Picard Version: 2.25.4
22:27:12.249 INFO ValidateVariants - Built for Spark Version: 2.4.5
22:27:12.249 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
22:27:12.249 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
22:27:12.249 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
22:27:12.249 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
22:27:12.249 INFO ValidateVariants - Deflater: IntelDeflater
22:27:12.249 INFO ValidateVariants - Inflater: IntelInflater
22:27:12.250 INFO ValidateVariants - GCS max retries/reopens: 20
22:27:12.250 INFO ValidateVariants - Requester pays: disabled
22:27:12.250 INFO ValidateVariants - Initializing engine
22:27:12.659 INFO FeatureManager - Using codec VCFCodec to read file file:///test/NIST_NA12878_calls_in_PLDv2.vcf
22:27:12.666 INFO ValidateVariants - Shutting down engine
[August 28, 2021 10:27:12 PM GMT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=1182793728
org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path NIST_NA12878_calls_in_PLDv2.vcf
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:436)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:377)
at org.broadinstitute.hellbender.engine.FeatureDataSource.(FeatureDataSource.java:319)
at org.broadinstitute.hellbender.engine.FeatureDataSource.(FeatureDataSource.java:291)
at org.broadinstitute.hellbender.engine.VariantWalker.initializeDrivingVariants(VariantWalker.java:58)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726)
at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:45)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Flag is an unsupported type for this kind of field, for input source: NIST_NA12878_calls_in_PLDv2.vcf
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:263)
at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:102)
at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:127)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:433)
... 13 more
Caused by: java.lang.IllegalArgumentException: Flag is an unsupported type for this kind of field
at htsjdk.variant.vcf.VCFCompoundHeaderLine.(VCFCompoundHeaderLine.java:243)
at htsjdk.variant.vcf.VCFFormatHeaderLine.(VCFFormatHeaderLine.java:50)
at htsjdk.variant.vcf.AbstractVCFCodec.parseHeaderFromLines(AbstractVCFCodec.java:198)
at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:111)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:26

@namra1 You're right that the error message isn't as helpful as it could be, but the file you provided does indeed appear to have an invalid header line:

##FORMAT=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">

Not sure how that got generated, but the VCF spec does prohibit Type=Flag for a ##FORMAT line. Flag is only applicable to ##INFO fields.

On second thought, reopening since we should fix the unhelpful error message to include some context.