samtools/htsjdk

Escaped doublequotes in INFO descriptions result in invalid VCF file

bartcharbon opened this issue · 2 comments

Edit 14/03: verified that this also occurs in version 3.0.4

Description of the issue:

When I add a header including a description containing escaped double quotes, sometimes the "escape slash" goes missing, resulting in a invalid VCF file.

Your environment:

  • version of htsjdk: 1.24.1 aand 3.0.4
  • version of java: OpenJDK 17.0.1
  • which OS: Windows and CentOS

Steps to reproduce

VCFHeader newHeader = annotator.annotateHeader(vcfFileReader.getFileHeader());    

newHeader(new VCFFormatHeaderLine("TEST", VCFHeaderLineCount.A, VCFHeaderLineType.String,"\"TEST\""));

writer.writeHeader(newHeader);
//... write variants

Expected behaviour

A VCF file is written with an INFO header:
##FORMAT=<ID=TEST,Number=A,Type=String,Description="\"TEST\"">

Actual behaviour

A VCF file is written with an INFO header:
##FORMAT=<ID=TEST,Number=A,Type=String,Description=""TEST\"">

The slash for the first escaped double quote is missing

Addition: this seems to be happening only for escaped quotes at the very start of the description

Thanks for the bug report. Looks like the internal representation is correct ("""TEST""), but it gets serialized as ""TEST\"" by VCFHeaderLine.escapeQuotes.