samtools/htsjdk

Merging VCFHeaders can result in incorrectly ordered contig lines

cmnbroad opened this issue · 0 comments

One case that fails to produce correct output is when merging headers where one header has a dictionary that is a subset of the other. Given two headers, one header with contigs "1" through "22" and one with contigs "20" and "21":

  • getContigLines returns the lines ordered "20", "21", "1", "2", "3", "4", ..., "19", "22" (all lines are included, but are out of order)
  • getSequenceDictionary has the lines ordered "20", "21", "3", "4", ..., "19", "22" (contigs "1" and "2" are missing entirely, remaining order is incorrect)
  • getemtaDataInSortedOrder - (which is used by VCFWriter to serialize the VCFHeader on write), same result as getSequenceDictionary, (contigs "1" and "2" are missing entirely, remaining order is incorrect)

Though I suspect these results depend on the order in which the headers are merged.

See #1573, which is the root cause of much of this.