hasindu2008/slow5tools

fast5 to blow5 conversion and back to fast5 gives size difference

Closed this issue · 2 comments

Hello,

I'm testing conversion back and forth of fast5/blow5 files to check lossless process.

First:
slow5tools-v0.4.0/slow5tools f2s fast5dir/ -d blow5dir/
OK 1 fast5 387Mo gives 1 blow5 200Mo

Then:
slow5tools-v0.4.0/slow5tools s2f blow5dir/ -d blow5tofast5dir/
Unexpected difference: 1 blow5 200Mo gives 1 fast5 307Mo

What could explain this size difference between the both fast5 (original and blow5-converted)? Is it due to directory structure?

P.S.: Conversions of both fast5 (original and blow5-converted) to fastq show no difference in size and content (except read sorting).

Thanks in advance

This difference can be either due to fragmented wasted space in the original FAST5 (see #50) or space consumed for redundantly storing the basecalled FASTQ read (see #70) or both. When converting a SLOW5 back to FAST5, those fragmented wasted space in the original FAST5 will no longer be there. Also, we do not store the basecalled read inside the SLOW5 and thus it is not included in the converted back FAST5 - from recently, even ONT has stopped putting the basecalled read into the FAST5 in latest MinKNOW, perhaps learning from SLOW5.

Thank you so much for your reply.
Less is more, SLOW is FAST.