Escaped doublequotes in INFO descriptions result in invalid VCF file
bartcharbon opened this issue · 2 comments
Edit 14/03: verified that this also occurs in version 3.0.4
Description of the issue:
When I add a header including a description containing escaped double quotes, sometimes the "escape slash" goes missing, resulting in a invalid VCF file.
Your environment:
- version of htsjdk: 1.24.1 aand 3.0.4
- version of java: OpenJDK 17.0.1
- which OS: Windows and CentOS
Steps to reproduce
VCFHeader newHeader = annotator.annotateHeader(vcfFileReader.getFileHeader());
newHeader(new VCFFormatHeaderLine("TEST", VCFHeaderLineCount.A, VCFHeaderLineType.String,"\"TEST\""));
writer.writeHeader(newHeader);
//... write variants
Expected behaviour
A VCF file is written with an INFO header:
##FORMAT=<ID=TEST,Number=A,Type=String,Description="\"TEST\"">
Actual behaviour
A VCF file is written with an INFO header:
##FORMAT=<ID=TEST,Number=A,Type=String,Description=""TEST\"">
The slash for the first escaped double quote is missing
Addition: this seems to be happening only for escaped quotes at the very start of the description
Thanks for the bug report. Looks like the internal representation is correct ("""TEST""
), but it gets serialized as ""TEST\""
by VCFHeaderLine.escapeQuotes
.