Lost output/information during RenameSampleInVcf
Closed this issue · 12 comments
Bug Report
Affected tool(s)
Tool name(s), special parameters?
RenameSampleInVcf
Affected version(s)
- Latest public release version [version?]
- Latest development/master branch as of [date of test?]
Description
Hello,
I sequenced enzymatic-methyl-seq samples, and then I called variants with cgmaptools. Lastly, I tried to merge all the vcf files with bcftools but the problem is that the samples have the same names.
I used RenameSampleInVcf to change the names of the samples but the outputs were very small files.
The files started as:
3573293264 Jun 11 14:22 V00001.mrkdup.vcf
3757642161 Jun 11 14:23 V00795.vcf
Ended up as
595 Jun 11 14:27 V00001.mrkdup.vcf
588 Jun 11 14:27 V00795.vcf
Steps to reproduce
Tell us how to reproduce this issue. If possible, include command lines that reproduce the problem and provide a minimal test case.
I ran the following code:
for i in *.vcf
do
base=$(basename $i ".vcf")
java -Xmx100G -jar /home/juaguila/appz/picard/build/libs/picard.jar RenameSampleInVcf \
INPUT=${base}.vcf \
OUTPUT=${base}.vcf \
NEW_SAMPLE_NAME=${base}
done
Expected behavior
Tell us what should happen
Files with very very similar file size
Actual behavior
Tell us what happens instead
File size significantly reduced
Feature request
Tool(s) involved
Tool name(s), special parameters?
Description
Specify whether you want a modification of an existing behavior or addition of a new capability.
Provide examples, screenshots, where appropriate.
Documentation request
Tool(s) involved
Tool name(s), parameters?
RenameSampleInVcf
Description
Describe what needs to be added or modified.
Thanks
Juan Pablo
@desmodus1984 You're using the same name for the input and output files - I would suggest using different names for the outputs.
It might be worth running ValidateVariants
(https://gatk.broadinstitute.org/hc/en-us/articles/21905106151963-ValidateVariants) on your VCF. I'd also check for the usual suspects (e.g. whitespace issues) that cause problems with VCFs. If all fails we might have to look at the VCF entry causing the exception.
Hi, I tried that and I still got that error
A USER ERROR has occurred: Fasta dict file file:///home/juaguila/BombusMethylSeq/Rec-5/mrkdup/../Bvos.dict for reference file:///home/juaguila/BombusMethylSeq/Rec-5/mrkdup/../Bvos.fasta does not exist. Please see https://gatk.broadinstitute.org/hc/articles/360035531652-FASTA-Reference-genome-format for help creating it.
On the website: https://gatk.broadinstitute.org/hc/en-us/articles/21905106151963-ValidateVariants, it said:
Usage examples
Minimally validate a file for adherence to VCF format:
gatk ValidateVariants \ -V cohort.vcf.gz
Validate a GVCF for adherence to VCF format, including REF allele match:
gatk ValidateVariants \ -V sample.g.vcf.gz \ -R reference.fasta -gvcf
I tried that:
_
gatk ValidateVariants \ -V V00001.mrkdup.vcf \ -R ../Bvos.fasta -gvcf_
And I got the error:
Using GATK jar /home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar ValidateVariants -V V00001.mrkdup.vcf -R ../Bvos.fasta -gvcf
USAGE: ValidateVariants [arguments]
Validates a VCF file with an extra strict set of criteria.
Version:4.5.0.0
Required Arguments:
--variant,-V A VCF file containing variants Required.
Optional Arguments:
--add-output-sam-program-record
If true, adds a PG tag to created SAM/BAM/CRAM files. Default value: true. Possible
values: {true, false}
--add-output-vcf-command-line
If true, adds a command line header line to created VCF files. Default value: true.
Possible values: {true, false}
--arguments_file read one or more arguments files and add them to the command line This argument may be
specified 0 or more times. Default value: null.
......
A USER ERROR has occurred: Illegal argument value: Positional arguments were provided ', -V{V00001.mrkdup.vcf{ -R{../Bvos.fasta}' but no positional argument is defined for this tool.
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Then with:
gatk ValidateVariants -V V00001.mrkdup.vcf -R ../Bvos.fasta
Using GATK jar /home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar ValidateVariants -V V00001.mrkdup.vcf -R ../Bvos.fasta
00:11:21.227 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
00:11:22.232 INFO ValidateVariants - ------------------------------------------------------------
00:11:22.291 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.5.0.0
00:11:22.291 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
00:11:22.291 INFO ValidateVariants - Executing as juaguila@u05.panther.net on Linux v3.10.0-1160.105.1.el7.x86_64 amd64
00:11:22.291 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v17.0.11-internal+0-adhoc..src
00:11:22.293 INFO ValidateVariants - Start Date/Time: June 14, 2024 at 12:11:20 AM EDT
00:11:22.293 INFO ValidateVariants - ------------------------------------------------------------
00:11:22.294 INFO ValidateVariants - ------------------------------------------------------------
00:11:22.295 INFO ValidateVariants - HTSJDK Version: 4.1.0
00:11:22.295 INFO ValidateVariants - Picard Version: 3.1.1
00:11:22.296 INFO ValidateVariants - Built for Spark Version: 3.5.0
00:11:22.297 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
00:11:22.297 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
00:11:22.297 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
00:11:22.298 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
00:11:22.298 INFO ValidateVariants - Deflater: IntelDeflater
00:11:22.298 INFO ValidateVariants - Inflater: IntelInflater
00:11:22.299 INFO ValidateVariants - GCS max retries/reopens: 20
00:11:22.299 INFO ValidateVariants - Requester pays: disabled
00:11:22.300 INFO ValidateVariants - Initializing engine
00:11:22.312 INFO ValidateVariants - Shutting down engine
[June 14, 2024 at 12:11:22 AM EDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=285212672
A USER ERROR has occurred: Fasta dict file file:///home/juaguila/BombusMethylSeq/Rec-5/mrkdup/../Bvos.dict for reference file:///home/juaguila/BombusMethylSeq/Rec-5/mrkdup/../Bvos.fasta does not exist. Please see https://gatk.broadinstitute.org/hc/articles/360035531652-FASTA-Reference-genome-format for help creating it.
Then when I tried running the code for creating the dictionary, I got another error:
gatk CreateSequenceDictionary -R Bvos.fasta
Using GATK jar /home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar CreateSequenceDictionary -R Bvos.fasta
INFO 2024-06-14 00:12:45 CreateSequenceDictionary Output dictionary will be written in Bvos.dict
00:12:45.681 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri Jun 14 00:12:45 EDT 2024] CreateSequenceDictionary --REFERENCE Bvos.fasta --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Fri Jun 14 00:12:45 EDT 2024] Executing as juaguila@u05.panther.net on Linux 3.10.0-1160.105.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.11-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.5.0.0
[Fri Jun 14 00:12:47 EDT 2024] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=285212672
Tool returned:
I installed gatk4 in a new fresh environment in conda.
I am lost on what I can do to fix the dictionary issue, or else.
According to the logs you shared, CreateSequenceDictionary was successful (There should be a Bvos.dict
file in the same directory as the Bvos.fasta
file).
Did you try rerunning the ValidateVariants command after it?
(The notation with the \
in the documentation can be confusing, it's only there so that the multiple lines could be copied by the users and pasted with the newlines into the terminal. If you remove the newlines, you should also remove the backslashes)
I ran it again, but changind the ref witht the entire path instead of just ../
and I got this:
gatk ValidateVariants -V V00001.mrkdup.vcf -R /home/juaguila/BombusMethylSeq/Rec-5/Bvos
.fasta
Using GATK jar /home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar ValidateVariants -V V00001.mrkdup.vcf -R /home/juaguila/BombusMethylSeq/Rec-5/Bvos.fasta
13:57:02.368 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
13:57:03.331 INFO ValidateVariants - ------------------------------------------------------------
13:57:03.372 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.5.0.0
13:57:03.372 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
13:57:03.374 INFO ValidateVariants - Executing as juaguila@u05.panther.net on Linux v3.10.0-1160.105.1.el7.x86_64 amd64
13:57:03.374 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v17.0.11-internal+0-adhoc..src
13:57:03.374 INFO ValidateVariants - Start Date/Time: June 14, 2024 at 1:57:02 PM EDT
13:57:03.375 INFO ValidateVariants - ------------------------------------------------------------
13:57:03.375 INFO ValidateVariants - ------------------------------------------------------------
13:57:03.377 INFO ValidateVariants - HTSJDK Version: 4.1.0
13:57:03.377 INFO ValidateVariants - Picard Version: 3.1.1
13:57:03.377 INFO ValidateVariants - Built for Spark Version: 3.5.0
13:57:03.378 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:57:03.378 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:57:03.379 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:57:03.379 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:57:03.379 INFO ValidateVariants - Deflater: IntelDeflater
13:57:03.380 INFO ValidateVariants - Inflater: IntelInflater
13:57:03.380 INFO ValidateVariants - GCS max retries/reopens: 20
13:57:03.380 INFO ValidateVariants - Requester pays: disabled
13:57:03.381 INFO ValidateVariants - Initializing engine
13:57:03.706 INFO FeatureManager - Using codec VCFCodec to read file file:///home/juaguila/BombusMethylSeq/Rec-5/mrkdup/rename/V00001.mrkdup.vcf
13:57:03.779 INFO ValidateVariants - Done initializing engine
13:57:03.779 WARN ValidateVariants - IDS validation cannot be done because no DBSNP file was provided
13:57:03.779 WARN ValidateVariants - Other possible validations will still be performed
13:57:03.819 INFO ProgressMeter - Starting traversal
13:57:03.819 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
13:57:03.996 INFO ValidateVariants - Shutting down engine
[June 14, 2024 at 1:57:04 PM EDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=285212672
htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 79: unparsable vcf record with allele Y, for input source: V00001.mrkdup.vcf
at htsjdk.variant.vcf.AbstractVCFCodec.generateException(AbstractVCFCodec.java:887)
at htsjdk.variant.vcf.AbstractVCFCodec.checkAllele(AbstractVCFCodec.java:678)
at htsjdk.variant.vcf.AbstractVCFCodec.parseSingleAltAllele(AbstractVCFCodec.java:706)
at htsjdk.variant.vcf.AbstractVCFCodec.parseAlleles(AbstractVCFCodec.java:648)
at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:443)
at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:384)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:328)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:48)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:377)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:356)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:317)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
at org.broadinstitute.hellbender.Main.main(Main.java:306)
Any suggestion on how to deal with this?
My issue is that my samples are from whole-genome bisulfite-sequencing, and I need to do a population differentiation to test for adaptation.
Thanks;
Could you post the entry at this line:
htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 79: unparsable vcf record with allele Y, for input source: V00001.mrkdup.vcf
Hi,
I checked the line 79 at the VCF and I was shocked to find this
NW_022882922.1 5266 . C T,Y 0 PASS NS=1:DP=14:GU=T/C GT:GQ:DP 1/2:0:14
NW_022882922.1 5271 . C T,Y 0 PASS NS=1:DP=14:GU=T/C GT:GQ:DP 1/2:0:14
NW_022882922.1 5283 . C T,Y 0 PASS NS=1:DP=14:GU=T/C GT:GQ:DP 1/2:0:14
NW_022882922.1 5295 . C T,Y 0 PASS NS=1:DP=15:GU=T/C GT:GQ:DP 1/2:0:15
NW_022882922.1 5302 . C T,Y 0 PASS NS=1:DP=12:GU=T/C GT:GQ:DP 1/2:0:12
After that line, many degenerate bases/sites.
For example:
V00001.vcf
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001
NW_022882922.1 28895 . C T 0 PASS NS=1:DP=52 GT:GQ:DP 0/1:0:52
NW_022882922.1 36586 . C T,Y 0 PASS NS=1:DP=23:GU=T/C GT:GQ:DP 1/2:0:23
NW_022882922.1 36640 . G A 0 PASS NS=1:DP=40 GT:GQ:DP 1/1:0:40
NW_022882922.1 39071 . A G 0 PASS NS=1:DP=43 GT:GQ:DP 1/1:0:43
or
V00747.vcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001
NW_022882922.1 8082 . G A 0 PASS NS=1:DP=6 GT:GQ:DP 0/1:0:6
NW_022882922.1 11106 . T G 0 PASS NS=1:DP=19 GT:GQ:DP 0/1:0:19
NW_022882922.1 17828 . C G 0 PASS NS=1:DP=27 GT:GQ:DP 0/1:0:27
NW_022882922.1 25160 . G Y 0 PASS NS=1:DP=37:GU=T/C GT:GQ:DP 0/1:0:37
NW_022882922.1 27396 . G A,R 0 PASS NS=1:DP=33:GU=A/G GT:GQ:DP 1/2:0:33
NW_022882922.1 28342 . G A,R 0 PASS NS=1:DP=27:GU=A/G GT:GQ:DP 1/2:0:27
Any suggestion on how to filter out those weird degenerate bases (R/Y)?
Thanks;
Unfortunately IUPAC encoded bases are not allowed in VCFs. They could be encoded as symbolic alleles,<A>, <R>
and declared in the header if you need them for some reason. Htsjdk / Picard can't actually represent them at all. It comes up occasionally so it would be good if we could process them, but we can't do it now. You'll have to use a different tool to either convert them to real bases or filter them out. I might ask on (SEQAnswers)[https://www.seqanswers.com/] to see if someone has one already.
Hi,
Do you know how could I delete them from the VCF?
Thanks,
You could run ValidateVariants
with the --warn-on-errors
flag to find all malformed lines and then write a small script to remove all those lines from your VCF. Now, I can't comment on how this might impact your downstream analysis so that's my little disclaimer.
Hi,
I did genotype my samples again with a better software for calling variants with bisulfite data, and I tried to validate the vcf files, and I ran into a different problem.
I downloaded the latest precompiled version of GATK, 4.6, and I tried this code:
./gatk ValidateVariants -V /home/juaguila/BombusMethylSeq/Rec-5/mrkdup/V02055.bsg.vcf.gz -R /home/juaguila/BombusMethylSeq/Rec-5/Bvos.fasta
And I got this error:
org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path /home/juaguila/BombusMethylSeq/Rec-5/mrkdup/V02055.bsg.vcf.gz
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:436)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:377)
at org.broadinstitute.hellbender.engine.FeatureDataSource.(FeatureDataSource.java:319)
at org.broadinstitute.hellbender.engine.FeatureDataSource.(FeatureDataSource.java:291)
at org.broadinstitute.hellbender.engine.VariantWalker.initializeDrivingVariants(VariantWalker.java:58)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726)
at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:45)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:147)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
at org.broadinstitute.hellbender.Main.main(Main.java:306)
Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: This codec is strictly for VCFv4 and does not support VCFv4.4, for input source: /home/juaguila/BombusMethylSeq/Rec-5/mrkdup/V02055.bsg.vcf.gz
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:265)
at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:104)
at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:129)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:433)
... 13 more
Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: This codec is strictly for VCFv4 and does not support VCFv4.4
at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:108)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:263)
... 17 more
Could you please suggest how can I verify/validate the vcf file for errors like the one mentioned above - IUPAC bases?
Thank you very much;