Merging VCFHeaders can result in incorrectly ordered contig lines
cmnbroad opened this issue · 0 comments
cmnbroad commented
One case that fails to produce correct output is when merging headers where one header has a dictionary that is a subset of the other. Given two headers, one header with contigs "1" through "22" and one with contigs "20" and "21":
- getContigLines returns the lines ordered "20", "21", "1", "2", "3", "4", ..., "19", "22" (all lines are included, but are out of order)
- getSequenceDictionary has the lines ordered "20", "21", "3", "4", ..., "19", "22" (contigs "1" and "2" are missing entirely, remaining order is incorrect)
- getemtaDataInSortedOrder - (which is used by VCFWriter to serialize the VCFHeader on write), same result as getSequenceDictionary, (contigs "1" and "2" are missing entirely, remaining order is incorrect)
Though I suspect these results depend on the order in which the headers are merged.
See #1573, which is the root cause of much of this.