nanoporetech/tombo

Resquiggle different fast5 format

pterzian opened this issue · 3 comments

Hi Marcus,

I have some questions about the fast5 format red by Tombo. It seems that my dataset is not following the standart format :

  • Preprocessing told me :

Basecalls exist in specified slot for some reads. Set --overwrite option to overwrite these basecalls.

So I tried to directly resquiggle as I have done before.

  • Resquiggle tells me :

Reads do not to contain basecalls. Check --basecall-group option if basecalls are stored in non-standard location or use tombo annotate_raw_with_fastqs to add basecalls from FASTQ files to raw FAST5 files.

Then I check what's inside the fast5 and this is what I found :

tombo_issue

So it seems to me that the fastq headers are already written in each read blocks. I tried to give the --basecall-group option the "Basecall_1D_000" but it didn't work as well (I may have not well understood the documentation on this part)

Would my issues be related to the fact my fast5 are multiplexed ?

ps: I recently had other fast5 format issues with the Simpson & al human ont project (2017), fastq were already into the fast5, I only managed to resquiggle 2 runs on 3 (MSssI & PCR) with less that 50% resquiggled reads (fast5 contained 1D & 2D reads)

thanks a bunch for the support,

Paul

For the sent screenshot these appear to be mutli-fast5 files. Tombo unfortunately only works with the single-read format at this time. As noted in the README (added recently) you can use the multi_to_single_fast5 command from the ont_fast5_api package.

Thank you for the quick answer, I saw this package recently and thought it may do the trick. I will let you know how it went.

Worked like a charm!