nanoporetech/pod5-file-format

fast5 > pod5 conversion is very slow when input and output are on the same physical HDD?

ItokawaK opened this issue · 2 comments

Pod5 version: 0.3.2

I am afraid someone may have already described somewhere, but because I was not able to find, let me report.

I tried to convert 8 fast5 files (in total ~6 Gbyte) in to single pod5 file.
The fast5 were stored in hdd disk, say hdd_1.

When I assigned directory located in the hdd_1 for "-o", it took approx. 8 min.

$ pod5 convert fast5 -o dir/in/hdd_1 *fast5 
Converting 8 Fast5s: 100%|#############| 29264/29264 [07:57<00:00, 61.25Reads/s]

On the other hand, when I assigned directory in another hdd, hdd_2, for "-o", it took only 31 s.

$ pod5 convert fast5 -o dir/in/hdd_2 *fast5 
Converting 8 Fast5s: 100%|#############| 29264/29264 [00:31<00:00, 942.41Reads/s]

It took around 31s, too, when input was on HDD and output was on SSD, or when input and output were on the same SSD.

@ItokawaK , HDDs use a reader head which must physically move to read and write data. It is not surprising that reading and writing from the same HDD would be slower as both reads and writes are competing for time with the reader head and must move it to the correct position on disk each time. Meanwhile SSD's are do not have this issue and have much better random access read / write performance.

@HalfPhoton Thanks for your comment. Yes, it is not surprising that it is slower to read and write in same HDD, but I did not expect such a large difference (~x15 slower).

Thus, I just wanted to note it for people to avoid same pitfall. Thank you!