ksamuk/pixy

Support for New Missing Data Formatting from GATK

ksamuk opened this issue · 4 comments

GATK has implemented a (quite radical) new way of encoding missing data, that we will need to support going forward:
https://gatk.broadinstitute.org/hc/en-us/articles/6012243429531

Hi Kieran,

I wonder what the result would be for the current version with the new GATK-generated vcf file as input. Does the results reliable?
Any suggestion if the new GATK-generated vcf file is not applicable?

Cheers,
Chen-Jui

Hi Chen-Jui,

I'm not quite sure at the moment, that is going to be a complex fix to implement. In the meantime, a quick fix might be to preprocess your data using bcftools to set genotypes with DP < 1 to "." as below:

bcftools +setGT your.vcf.gz -- -t q -n . -e 'FMT/DP>=1'

Cheers,

Kieran

This is now addressed in the latest version of pixy.