s4hts/HTStream

What should be done about Novaseq base qualities?

Opened this issue · 1 comments

Instead of quality scores from 2 to 40, the Novaseq (and maybe iSeq100 and Nextseq?) has quality scores of only 2, 12, 23 and 37.

  1. How should this be dealt with for quality score based window trimming?

  2. How should this be dealt with for overlapping reads and resolving mis-matched bases?

  3. Should data from a Novaseq etc be made to conform to the 2, 12, 23, 37 quality scheme (introducing more complexity to the different algorithms)?

  4. Perhaps a new tool that tries to improve/correct quality scores could be useful.

See
https://lh3.github.io/2017/07/24/on-nonvaseq-base-quality
and
http://lh3.github.io/2014/11/03/on-hiseq-x10-base-quality

For some discussion on this issue.

Merging in #142 issue