qiime2/q2-types

EMP formats require Phred designation

Closed this issue · 2 comments

Bug Description
Currently it looks like they assume Phred33 and does not check otherwise.

Screenshots
Importing Phred64 works, but makes a mess:

image

References
forum xref

We've never done anything about this because it's only possible to guess the Phred offset, as far as I know, e.g., based on frequency of characters. The user-provided offset would need to be validated during import (because we normalize to Phred 33 during import), and we'd have to have a way to override that check if the guess was wrong, and that's not something we can easily do while importing. While less than ideal to just not validate the user's Phred offset, the issue often becomes really obvious (as in the plot you shared) and Phred offset 64 data is really uncommon these days.

@nbokulich, do you think this is enough of an issue in practice that we should address it with some sort of check based on frequency of the characters?

hey yeah we could close as phred64 is rare these days, though it still crops up (e.g., with re-use of old datasets)