rrwick/Deepbinner

suppor for multi-fast5 input?

yjx1217 opened this issue · 5 comments

Hello, I have been testing Deepbinner-0.2.0 (github commit 886efc0) on our local server. It ran well for our older MinION data (single fast5 files; before the recent MinNKOW update) but encountered error for our new MinION data (multi-fast5 files; after the recent MinNKOW update). So I was wondering if deepbinner doesn't support multi-fast5 input yet. Thanks in advance! See below for the error message:

2018-12-28 11:55:39.805480: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Using TensorFlow backend.
Loading /home/jxyue/Projects/LRSDAY-v1.3.0/build/PATCH/Deepbinner-0.2.0/py3_virtualenv_deepbinner/lib/python3.4/site-packages/deepbinner/models/EXP-NBD103_read_starts... done
Loading /home/jxyue/Projects/LRSDAY-v1.3.0/build/PATCH/Deepbinner-0.2.0/py3_virtualenv_deepbinner/lib/python3.4/site-packages/deepbinner/models/EXP-NBD103_read_ends... done

Looking for fast5 files in /home/jxyue/Projects/LRSDAY-v1.3.0/Project_Jonas/00.Long_Reads/Basecalling_Guppy_out... 318 fast5s found
^MClassifying fast5s: 0 / 318 (0.0%)Traceback (most recent call last):
  File "/home/jxyue/Projects/LRSDAY-v1.3.0/build/PATCH/Deepbinner-0.2.0/py3_virtualenv_deepbinner/bin/deepbinner", line 11, in <module>
    sys.exit(main())
  File "/home/jxyue/Projects/LRSDAY-v1.3.0/build/PATCH/Deepbinner-0.2.0/py3_virtualenv_deepbinner/lib/python3.4/site-packages/deepbinner/deepbinner.py", line 60, in main
    classify(args)
  File "/home/jxyue/Projects/LRSDAY-v1.3.0/build/PATCH/Deepbinner-0.2.0/py3_virtualenv_deepbinner/lib/python3.4/site-packages/deepbinner/classify.py", line 46, in classify
    output_size, args)
  File "/home/jxyue/Projects/LRSDAY-v1.3.0/build/PATCH/Deepbinner-0.2.0/py3_virtualenv_deepbinner/lib/python3.4/site-packages/deepbinner/classify.py", line 127, in classify_fast5_files
    read_id, signal = get_read_id_and_signal(fast5_file)
  File "/home/jxyue/Projects/LRSDAY-v1.3.0/build/PATCH/Deepbinner-0.2.0/py3_virtualenv_deepbinner/lib/python3.4/site-packages/deepbinner/load_fast5s.py", line 27, in get_read_id_and_signal
    read_group = list(hdf5_file['Raw/Reads/'].values())[0]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/jxyue/Projects/LRSDAY-v1.3.0/build/PATCH/Deepbinner-0.2.0/py3_virtualenv_deepbinner/lib/python3.4/site-packages/h5py/_hl/group.py", line 262, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: 'Unable to open object (component not found)'

Loading classifications... done
0 total classifications found

Writing reads: 0 ^MTraceback (most recent call last):
  File "/home/jxyue/Projects/LRSDAY-v1.3.0/build/PATCH/Deepbinner-0.2.0/py3_virtualenv_deepbinner/bin/deepbinner", line 11, in <module>
    sys.exit(main())
  File "/home/jxyue/Projects/LRSDAY-v1.3.0/build/PATCH/Deepbinner-0.2.0/py3_virtualenv_deepbinner/lib/python3.4/site-packages/deepbinner/deepbinner.py", line 64, in main
    bin_reads(args)
  File "/home/jxyue/Projects/LRSDAY-v1.3.0/build/PATCH/Deepbinner-0.2.0/py3_virtualenv_deepbinner/lib/python3.4/site-packages/deepbinner/bin.py", line 32, in bin_reads
    write_read_files(args.reads, classifications, out_filenames, input_type)
  File "/home/jxyue/Projects/LRSDAY-v1.3.0/build/PATCH/Deepbinner-0.2.0/py3_virtualenv_deepbinner/lib/python3.4/site-packages/deepbinner/bin.py", line 146, in write_read_files
    out_files[class_name].write(read_line_1)
KeyError: 'not found'

I also ran into the same problem. Any progression on this?

I have the same problem...

In case anyone is still looking for a solution, you can use multi_to_single_fast5 from https://github.com/nanoporetech/ont_fast5_api in order to convert multi fast5 to single fast5 and then run deepbinner

Hi all!

@bsaintjo gave the simplest answer - just turn your reads into single-read fast5s first. But that doesn't help if you want to run Deepbinner in realtime, so I just pushed up an update to make the deepbinner realtime command work with multi-read fast5 files. It uses multi_to_single_fast5 internally, so you'll still need that installed.

See more info here in the README.

Ryan

Hi @rrwick ,

Excuse my ignorance, but I don't see how Deepbinner couldn't classify multi-fast5s? i.e you mention in the README

if one fast5 file contains reads from more than one barcode, then it cannot simply be moved into a bin

But each entry within a multi-fast5 has a unique ID so surely that is all you need? I.e in the classification file you just add the read ID and the barcode it maps to, then the read within the fastq output of guppy with that read ID just gets binned?

Obviously, it would require a change in your method of handling fast5 files though.