Support for New Missing Data Formatting from GATK
ksamuk opened this issue · 4 comments
GATK has implemented a (quite radical) new way of encoding missing data, that we will need to support going forward:
https://gatk.broadinstitute.org/hc/en-us/articles/6012243429531
Hi Kieran,
I wonder what the result would be for the current version with the new GATK-generated vcf file as input. Does the results reliable?
Any suggestion if the new GATK-generated vcf file is not applicable?
Cheers,
Chen-Jui
Hi Chen-Jui,
I'm not quite sure at the moment, that is going to be a complex fix to implement. In the meantime, a quick fix might be to preprocess your data using bcftools to set genotypes with DP < 1 to "." as below:
bcftools +setGT your.vcf.gz -- -t q -n . -e 'FMT/DP>=1'
Cheers,
Kieran
This is now addressed in the latest version of pixy.