nanoporetech/pod5-file-format

No documentation regarding multi-file pod5 dependency

VasLem opened this issue · 2 comments

I am currently trying to convert a large ONT sample, made up of fast5 files, to pod5 ones. As I am working on a pre-emptible cluster, where any of the task may be cancelled by the provider at any-time, I want to convert the fast5 files in a distributed fashion, so spawn different machine to deal with a different subset of fast5 files. My question is whether converting all the fast5 separately and listing all the pod5 files in the same directory will produce the same exact output as converting them altogether. If so, is there another way to avoid the issue of getting "is not a multi-read fast5 file." while passing a single fast5 file into a machine (I actually have a prime number of files...), apart from copying it and then discarding the converted clone of it? Thank you in advance.

The documentation you're looking for is here.

The error you're seeing regarding "single-read" fast5 is the type of file not the number of reads in a multi-read fast5. They're different file formats. If you have single-read fast5 files you must first convert them to the multi-read fast5 file. How to do this is in the documentation.

Other than the ordering of reads (which shouldn't matter whatsoever) the output will be the same if you convert each file individually.

Kind regards,
Rich

Great, thank you for the explanations. The file I had trouble with was corrupt, I luckily managed to find a healthy version of it, as recovering a 800MB ONT corrupted file does not seem to be straightforward. Thanks again!