Resquiggle different fast5 format
pterzian opened this issue · 3 comments
Hi Marcus,
I have some questions about the fast5 format red by Tombo. It seems that my dataset is not following the standart format :
- Preprocessing told me :
Basecalls exist in specified slot for some reads. Set --overwrite option to overwrite these basecalls.
So I tried to directly resquiggle as I have done before.
- Resquiggle tells me :
Reads do not to contain basecalls. Check --basecall-group option if basecalls are stored in non-standard location or use
tombo annotate_raw_with_fastqs
to add basecalls from FASTQ files to raw FAST5 files.
Then I check what's inside the fast5 and this is what I found :
So it seems to me that the fastq headers are already written in each read blocks. I tried to give the --basecall-group option the "Basecall_1D_000" but it didn't work as well (I may have not well understood the documentation on this part)
Would my issues be related to the fact my fast5 are multiplexed ?
ps: I recently had other fast5 format issues with the Simpson & al human ont project (2017), fastq were already into the fast5, I only managed to resquiggle 2 runs on 3 (MSssI & PCR) with less that 50% resquiggled reads (fast5 contained 1D & 2D reads)
thanks a bunch for the support,
Paul
For the sent screenshot these appear to be mutli-fast5 files. Tombo unfortunately only works with the single-read format at this time. As noted in the README (added recently) you can use the multi_to_single_fast5
command from the ont_fast5_api package.
Thank you for the quick answer, I saw this package recently and thought it may do the trick. I will let you know how it went.
Worked like a charm!