nanoporetech/tombo

tombo resquiggle doesn't work with particular fast5 files

taras2706 opened this issue ยท 12 comments

Hi,
I downloaded fast5 files for groups IVT and Vero-Infected (with corresponding fastaq files) from the repository: https://osf.io/8f6n9/. I run this code for fast5 files from the IVT group:

multi_to_single_fast5 -i IVT/fast5/ -s IVT/fast5_single/ -t 2
tombo preprocess annotate_raw_with_fastqs --fast5-basedir IVT/fast5_single/ --fastq-filenames IVT1.all.fastq
tombo resquiggle IVT/fast5_single/ covid.fasta --processes 4 --num-most-common-errors 5

And all goes smoothly. However, when I am trying to run the same for Vero-Infected group, tombo resquiggle command output is :
[22:30:48] Loading minimap2 reference. [22:30:48] Getting file list. [22:30:48] Loading default canonical ***** RNA ***** model. [22:30:48] Re-squiggling reads (raw signal to genomic sequence alignment). 5 most common unsuccessful read types (approx. %): ----- ----- ----- ----- ----- 0%| | 0/8000 [00:00<?, ?it/s]Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/mnt/f/Chrome_Downloads/covid19/tombo/tombo/resquiggle.py", line 1662, in _io_and_mappy_thread_worker _io_and_map_read( File "/mnt/f/Chrome_Downloads/covid19/tombo/tombo/resquiggle.py", line 1395, in _io_and_map_read all_raw_signal = th.get_raw_read_slot(fast5_data)['Signal'][:] File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/home/taras/.local/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 573, in __getitem__ self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5d.pyx", line 182, in h5py.h5d.DatasetID.read File "h5py/_proxy.pyx", line 130, in h5py._proxy.dset_rw File "h5py/_proxy.pyx", line 84, in h5py._proxy.H5PY_H5Dread OSError: Can't read data (can't open directory: /usr/local/hdf5/lib/plugin)
And the same for each thread. Progress bas stops on some number from 0 to 30 and stucks.
image
Is it some issue with fast5 files from Vero-Infected group or corresponding fastaq file?

This looks like potentially a VBZ compression issue. See the community note here and the github page here. If configuring this plugin does not resolve this issue please post here.

Thank you for your quick reply!
I am downloading not compressed files from the repository. Therefore, I don't think that the plugin will help if the problem in non-compressed data. I will try on other datasets and post results here.

The compression is within the hdf5 file (applied to the raw signal via the plugin referenced in this error) and was recently changed to be the default output for all Nanopore reads. It is not obvious whether a particular downloaded file contains compressed signal (Iโ€™m not sure whether there is a tool to determine whether a file contains compressed signal). Thus I think the VBZ compression still might be the source of this issue.

I installed the next package for this purpose:
sudo apt install hdf5-tools
After I run h5repack -v -f GZIP=1 input.fast5 output.fast5 according to the discussion with files which weren't able to be processed by tombo resquiggle (Vero-Infected group) the warning produced:
Warning: dataset </read_f65f38af-f792-409f-b475-75f9562a294a/Raw/Signal> cannot be read, user defined filter is not available and size of output files was considerably lower (30 MB vs 246MB, I suppose because of raw signal data were not witten to the output file).
In contrast, running the same command with the files which were successfully processed by tombo resquiggle (IVT group) didn't produce the warning, and the size of the output file is almost equal to the size of the input.
Does this mean Vero-Infected group fast5 files are damaged?

This is likely not the case since the files were successfully processed with multi_to_single_fast5. This command could be re-run without the vbz compression option to check this.

The original error still suggests (as mentioned in the linked discussion) that the vbz plugin should solve this issue. Have you tried installing the plugin?

I installed the plugin by pip3.6 install pyvbz-1.0.0-cp36-cp36m-linux_x86_64.whl prior to run of h5repack (but I don't see /usr/local/hdf5/ directory).
Also, I cannot find out how to run multi_to_single_fast5 without the vbz compression.

You may have to set the export HDF5_PLUGIN_PATH=[path/to/plugin] environment variable in order for the hdf5 library to find the plugin on your system, though installation should point to the default hdf5 plugins location. Did the command work after the installation?

I was mistaken concerning the vbz compression via the multi_to_single_fast5 command. The fast5_subset command makes the compression available via the -c argument, but the multi_to_single_fast5 command does not appear to do the same.

I did:

  1. In: pip3.6 --version
    Out: pip 20.1.1 from /usr/local/lib/python3.6/site-packages/pip (python 3.6)
  2. pip3.6 install pyvbz-1.0.0-cp36-cp36m-linux_aarch64.whl, the .whl file was downloaded from the page. As result, pyvbz==1.0.0 from file:///usr/local/pyvbz-1.0.0-cp36-cp36m-linux_x86_64.whl in /home/taras/.local/lib/python3.6/site-packages (1.0.0)
    I have only hdf5-tools folder without /lib/plugin subdirectories, thus no way to use export HDF5_PLUGIN_PATH=/path/to/hdf5(-tools)/lib/plugin
  3. h5repack -v -f GZIP=1 input.fast5 output.fast5, which produced the warning.
    So, I probably have some problems with the compatibility of vbz_compression plugin with hdf5-tools package (sudo apt install hdf5-tools).

If this information could be useful, I re-basecalled the files from both groups using MinKNOW (Guppy). As a result, all remained the same: one group was processed successfully by h5repack, another wasn`t.

Have you tried re-running the tombo command after installing the vbz plugin?

Sorry, it is my rude mistake. You were totally right regarding the vbz_plugin.
I have downloaded the wrong file for the plugin installation pyvbz-1.0.0-cp36-cp36m-linux_x86_64.whl, while correct one was ont-vbz-hdf-plugin-1.0.0-Linux-x86_64.tar.gz from the Releases page.
You just need to:

  1. tar xvzf ont-vbz-hdf-plugin-1.0.0-Linux-x86_64.tar.gz
  2. export HDF5_PLUGIN_PATH=\your_directory_with_extracted_folder\ont-vbz-hdf-plugin-1.0.0-Linux\usr\local\hdf5\lib\plugin
    After this, both h5repack and tombo resquiggle work without errors/warnings.
    Thank you a lot @marcus1487 for the continued support!

Thank you for your brief & efficient solution @taras2706

Good afternoon, I apologize for my ignorance, I'm new to command lines.. how do I use the export command to export hdf5 plugin please?