popitsch/nanopanel2

basecall_grp?

ChristopherRichie opened this issue · 6 comments

Greetings!
I am trying to run nanopanel2 on my samples which were basecalled and demultiplexed with the following versions:
MinKNOW 21.02.2
MinKNOW Core 4.2.4
Bream 6.1.10
Guppy 4.3.4

And I am just trying to run np2 with the data for barcode02.
"fast5_dir": "//data/fast5_pass/barcode02", # workspace output dir of guppy that contains basecalled FAST5 files
"fastq_dir": "//data/fastq_pass/barcode02", # output dir of guppy that contains fastq.gz files; needed by porechop and can be omitted if no demultiplexing is configured

I am gettign this error:

KeyError: "Unable to open object (object 'Basecall_1D_001' doesn't exist)"
I do not see any reference to this term in your paper, and I cannot find anything with a Google search.

this error appears even after I remove the following line from the config file:
"basecall_grp": "Basecall_1D_001", # the used basecall group identifier in the FAST5 files

my config file is attached
config.json.zip

Any ideas?

I am hoping to use this program to analyze somatic variation in rat neurons after CRISPR mutagenesis.

Thanks
chris

Hi and sorry for the late reply,

The configuration property “basecall_grp” should be set to the respective name used as a key in the FAST5 output files of guppy which are in fact HDF5 files. You can inspect HDF5 files, e.g., with ‘h5ls’ (see https://support.hdfgroup.org/products/hdf5_tools/).
Could you check and let me know what basecall group IDs you see in your data?

Guppy will per default use ‘Basecall_1D_000’ for the first basecalling run you apply to these files, ‘Basecall_1D_001’ for the second one aso. ‘Basecall_1D_001’ is the default value, so that's why removing this section from the config file would not change anything.

HTH, BW niko

Hi Christopher
Can you please run the following in a python 3.7 console:

import h5py

def print_h5_keys_recursive(a, pad=''):
    if hasattr(a, 'keys'):
        for k in list(a.keys()):
            print(pad, k)
            if k in a:
                print_h5_keys_recursive(a[k], pad+'  ')

def print_h5_keys(fast5_file):
    """ print an exemplary h5 key structure for the passed fast5 files """
    f = h5py.File(fast5_file, 'r')
    first_read_name=next(iter(f.keys()))
    print_h5_keys_recursive(f[first_read_name])

print_h5_keys('myfile.fast5') # replace by one of your FAST5 filenames here

Here is an example output from one of my FAST5 files:

>>> print_h5_keys('ACD011_pass_65241621_98.fast5')
 Analyses
   Basecall_1D_000
     BaseCalled_template
       Fastq
     Summary
       basecall_1d_template
   Basecall_1D_001
     BaseCalled_template
       Fastq
       Move
       Trace
     Summary
       basecall_1d_template
   Segmentation_000
     Summary
       segmentation
   Segmentation_001
     Summary
       segmentation
 Raw
   Signal
 channel_id
 context_tags
 tracking_id

Hi

  • re. your FAST5 structure: Np2 expects the following sections in the FAST5 file:
    [read_id]["Analyses"][basecall_grp]["BaseCalled_template"]["Trace"] and
    [read_id]["Analyses"][basecall_grp]["BaseCalled_template"]["Move"].
    Np2 was developed and tested with guppy 3.6.1 and I haven't tested with newer guppy versions yet. With guppy 3.x versions, this sections were written when setting the '--trace_categories_logs Move' parameter (see README for an example commandline for guppy 3.6.1). Can you try re-basecalling your data with a similar commandline using your guppy version and re-run the script? There should then be a new basecall group which includes the Trace and Move sections.

  • re. the error: seems as if last does not work properly on your system. I'd need to see the log file to see what's going on.
    Having said that, as our evaluation clearly showed that minimap2 is the best mapper for np2 data analysis, I'd recommend to disable last and ngm in the config (you can comment those sections out as shown in the example config file)...