Add support for arbitrary ploidy level (at least warn about incompatibility with polyploids)
taprs opened this issue · 1 comments
Thank you for this tool! The idea is pretty elegant and we have been needing the "official" script to do these simple stats for so long...
I understand that this was likely addressed in #79 but I want it to be said explicitly : we would like to see arbitrary ploidy level support! From our test runs it seems that pixy
silently takes first two alleles in polyploids and thus dramatically lowers the pi estimate for polyploids.
It would be good, as a first quick fix, to add a warning (or maybe even an error?) if any cells of the input VCF have ploidy other than 2. Then I have a dream of being able to use pixy
with arbitrary ploidy level, including cases when different samples or genomic positions have different ploidy levels...
Best wishes,
Nikita
I drafted a commit that outputs correct pi and dxy values for arbitrary ploidy levels provided that the maximum number of alleles per site is given as --ploidy
argument in my fork: https://github.com/taprs/pixy
So far it messes the n_missing
counts, but I can improve it further if it can be merged into pixy
later. I do not want to maintain my own fork 😈