wdecoster/nanostat

qscore read count discrepancy

Closed this issue · 1 comments

Hi,
I've noticed that when I run NanoStat on a sequencing_summary file and then also on the fastq files generated by guppy (6.1.1, default qscore cutoff=10) the number of reads in the quality cutoff section do not match up. I suspect the qscore values generated by guppy (column 15 in summary file) are calculated differently compared to what NanoStat produces directly from fastq files but in some cases the difference can be 200k+ reads. Is this expected behavior?

e.g.

sequencing_summary.txt
>Q5    1118963 (99.1%) 2147.0Mb
>Q7    1044452 (92.5%) 2007.4Mb
>Q10    819696 (72.6%) 1575.8Mb
>Q12    594249 (52.6%) 1144.5Mb
>Q15    120048 (10.6%) 231.4Mb

FASTQ file
>Q5:	602866 (100.0%) 1189.2Mb
>Q7:	602866 (100.0%) 1189.2Mb
>Q10:	589012 (97.7%) 1162.5Mb
>Q12:	407558 (67.6%) 805.3Mb
>Q15:	72742 (12.1%) 142.9Mb

I know I should just stick with stats generated from the fastq files and move on but I'm hoping for a sanity check here.

Thanks!

Hi,

Do you mean the values of the fastq file were generated on only those reads passing the Q-score cutoff?
There are definitely subtle differences (for reasons I don't know) in how the quality is calculated within guppy (and thus in the summary) or how they wrote it down for the fastq.

Wouter