Column order discrepancy between OS
Closed this issue · 4 comments
Problem description
We're using HapCUT2 between for phasing vcf files and have seen the output phased vcf files switches allele 1 and 2 depending on if osx or linux is used (copy A and B becomes B and A). Although this in a biological sense means the same thing we're getting different md5sums which can be confounding when trying to develop cross-platform pipilines and software.
We're phasing according to the 10x linked reads specifications.
Command
HAPCUT2 --nf 1 --fragments <input-linked> --vcf <input-vcf> --out <out-phase> --outvcf 1
HAPCUT2 out-phase files
linux
BLOCK: offset: 6 len: 4 phased: 2 SPAN: 23514 fragments 1
6 0 1 chr1mini 24987 G C 0/1:.:960:124,134:250,238:99 0 . 41.00 1
9 0 1 chr1mini 48501 T A 0/1:.:903:158,146:246,214:99 0 . 41.00 1
osx
BLOCK: offset: 6 len: 4 phased: 2 SPAN: 23514 fragments 1
6 1 0 chr1mini 24987 G C 0/1:.:960:124,134:250,238:99 0 . 41.00 1
9 1 0 chr1mini 48501 T A 0/1:.:903:158,146:246,214:99 0 . 41.00 1
HapCUT2 starts from a random haplotype solution for each BLOCK which can result in the different order of the two haplotypes. A potential solution would be to always output the haplotype with more '0's in the second column. I will let you know once this has been implemented.
Sounds good, thanks!
The fix has been implemented in branch 'merging_021820'. For each block, the haplotype with '0' allele for the first variant in the block is output first.
Awesome! Thanks!