basecall_grp?

Question

basecall_grp?

ChristopherRichie opened this issue 3 years ago · 6 comments

Greetings!
I am trying to run nanopanel2 on my samples which were basecalled and demultiplexed with the following versions:
MinKNOW 21.02.2
MinKNOW Core 4.2.4
Bream 6.1.10
Guppy 4.3.4

And I am just trying to run np2 with the data for barcode02.
"fast5_dir": "//data/fast5_pass/barcode02", # workspace output dir of guppy that contains basecalled FAST5 files
"fastq_dir": "//data/fastq_pass/barcode02", # output dir of guppy that contains fastq.gz files; needed by porechop and can be omitted if no demultiplexing is configured

I am gettign this error:

KeyError: "Unable to open object (object 'Basecall_1D_001' doesn't exist)"
I do not see any reference to this term in your paper, and I cannot find anything with a Google search.

this error appears even after I remove the following line from the config file:
"basecall_grp": "Basecall_1D_001", # the used basecall group identifier in the FAST5 files

my config file is attached
config.json.zip

Any ideas?

I am hoping to use this program to analyze somatic variation in rat neurons after CRISPR mutagenesis.

Thanks
chris

Answer 1 · 2021-09-20T10:28:37.000Z

Hi and sorry for the late reply,

The configuration property “basecall_grp” should be set to the respective name used as a key in the FAST5 output files of guppy which are in fact HDF5 files. You can inspect HDF5 files, e.g., with ‘h5ls’ (see https://support.hdfgroup.org/products/hdf5_tools/).
Could you check and let me know what basecall group IDs you see in your data?

Guppy will per default use ‘Basecall_1D_000’ for the first basecalling run you apply to these files, ‘Basecall_1D_001’ for the second one aso. ‘Basecall_1D_001’ is the default value, so that's why removing this section from the config file would not change anything.

HTH, BW niko

Answer 2 · 2021-09-20T15:51:01.000Z

Hi Niko! thanks for your reply.. I have run h5ls on the files in the fast5_pass folder. These files were processed and demultiplexed by Guppy during the run. ***@***.*** fast5_pass]$ cd barcode05 ***@***.*** barcode05]$ ls ahd405_pass_barcode05_63d9a674_0.fast5 ahd405_pass_barcode05_63d9a674_12.fast5 ahd405_pass_barcode05_63d9a674_15.fast5 ahd405_pass_barcode05_63d9a674_18.fast5 ahd405_pass_barcode05_63d9a674_3.fast5 ahd405_pass_barcode05_63d9a674_6.fast5 ahd405_pass_barcode05_63d9a674_9.fast5 ahd405_pass_barcode05_63d9a674_10.fast5 ahd405_pass_barcode05_63d9a674_13.fast5 ahd405_pass_barcode05_63d9a674_16.fast5 ahd405_pass_barcode05_63d9a674_1.fast5 ahd405_pass_barcode05_63d9a674_4.fast5 ahd405_pass_barcode05_63d9a674_7.fast5 ahd405_pass_barcode05_63d9a674_11.fast5 ahd405_pass_barcode05_63d9a674_14.fast5 ahd405_pass_barcode05_63d9a674_17.fast5 ahd405_pass_barcode05_63d9a674_2.fast5 ahd405_pass_barcode05_63d9a674_5.fast5 ahd405_pass_barcode05_63d9a674_8.fast5 ***@***.*** barcode05]$ h5ls ahd405_pass_barcode05_63d9a674_0.fast5 read_00d670ad-cc20-4321-ad7c-1bfc4dd13f29 Group read_00f3cbba-bc4f-4058-94f8-5a9ec22a8ba4 Group read_00fc1107-dfea-473c-b73b-4143e3d946f4 Group read_0135a8a8-035e-46f2-a234-b6252ee3a526 Group read_015615ef-8673-42bc-97a8-86e11df7a65e Group … many many more. ***@***.*** barcode05]$ cd ../barcode06 ***@***.*** barcode06]$ ls ahd405_pass_barcode06_63d9a674_0.fast5 ***@***.*** barcode06]$ h5ls ahd405_pass_barcode06_63d9a674_0.fast5 read_12ea8f99-7bc4-4fa3-a888-08e1481ffcb1 Group read_295e2c33-38f5-4cd6-88ef-db29de47cd64 Group read_340fccd6-1100-4946-8ad0-7cf16e24c3a6 Group read_37bcc780-d898-41db-9dc4-0ec2381566f0 Group read_44f26b88-f6ba-4d96-92a8-349423f9c1a8 Group read_6832e5cd-51e3-4f3c-800e-22370d34ecd6 Group Is the second column where you would expect to find the “Basecall_1D_000” instead of “Group”? I am still eager to get this working. Please let me know if I need to look somewhere else. just in case, I have added the Run Info from the summary pdf below. THANKS! chris Run Info Host Name MC-110461 (localhost) Experiment Name CR2839 Sample ID LVF910_pool Run ID f581bd41-431f-431f-91e0-1fd23e60e841 Flow Cell Id ahd405 Start Time September 7, 18:34 Run Length 18h 32m Run Summary Reads Generated 116.6 K Passed Bases 199.96 Mb Failed Bases 40.42 Mb Estimated Bases 194.48 Mb Run Parameters Flow Cell Type FLO-FLG001 Kit SQK-PBK004 Initial Bias Voltage -180 mV FAST5 Output Enabled FASTQ Output Enabled BAM Output Disabled Active Channel Selection Enabled Basecalling on Specified Run Length 24 hours FAST5 Reads per File 1000 FAST5 Output Options zlib_compress,fastq,raw FASTQ Reads per File 1000 Mux Scan Period 1 hour 30 minutes Reserved Pores 0 % Basecall Model High-accuracy basecalling Barcoding barcoding_kits=["SQKPBK004"], trim_barcodes="off",require_barcodes_both_ends="off", detect_mid_strand_barcodes="off",min_score=40 Read Filtering min_qscore=7 Versions MinKNOW 21.02.2 MinKNOW Core 4.2.4 Bream 6.1.10 Guppy 4.3.4 From: Niko Popitsch ***@***.***> Sent: Monday, September 20, 2021 6:29 AM To: popitsch/nanopanel2 ***@***.***> Cc: Richie, Christopher (NIH/NIDA) [E] ***@***.***>; Author ***@***.***> Subject: Re: [popitsch/nanopanel2] basecall_grp? (#6) Hi and sorry for the late reply, The configuration property “basecall_grp” should be set to the respective name used as a key in the FAST5 output files of guppy which are in fact HDF5 files. You can inspect HDF5 files, e.g., with ‘h5ls’ (see https://support.hdfgroup.org/products/hdf5_tools/). Could you check and let me know what basecall group IDs you see in your data? Guppy will per default use ‘Basecall_1D_000’ for the first basecalling run you apply to these files, ‘Basecall_1D_001’ for the second one aso. ‘Basecall_1D_001’ is the default value, so that's why removing this section from the config file would not change anything. HTH, BW niko — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#6 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AR4COQ64SW3ZWMGDRJGQBIDUC4EF7ANCNFSM5D6HP4UQ>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Answer 3 · 2021-09-21T12:32:18.000Z

Hi Christopher
Can you please run the following in a python 3.7 console:

import h5py

def print_h5_keys_recursive(a, pad=''):
    if hasattr(a, 'keys'):
        for k in list(a.keys()):
            print(pad, k)
            if k in a:
                print_h5_keys_recursive(a[k], pad+'  ')

def print_h5_keys(fast5_file):
    """ print an exemplary h5 key structure for the passed fast5 files """
    f = h5py.File(fast5_file, 'r')
    first_read_name=next(iter(f.keys()))
    print_h5_keys_recursive(f[first_read_name])

print_h5_keys('myfile.fast5') # replace by one of your FAST5 filenames here

Here is an example output from one of my FAST5 files:

>>> print_h5_keys('ACD011_pass_65241621_98.fast5')
 Analyses
   Basecall_1D_000
     BaseCalled_template
       Fastq
     Summary
       basecall_1d_template
   Basecall_1D_001
     BaseCalled_template
       Fastq
       Move
       Trace
     Summary
       basecall_1d_template
   Segmentation_000
     Summary
       segmentation
   Segmentation_001
     Summary
       segmentation
 Raw
   Signal
 channel_id
 context_tags
 tracking_id

Answer 4 · 2021-09-21T15:14:20.000Z

Hi Niko, I used your script and get this output: print_h5_keys('ahd405_pass_barcode05_63d9a674_0.fast5') # replace by one of your FAST5 filenames here Analyses Basecall_1D_000 BaseCalled_template Fastq Summary basecall_1d_template Segmentation_000 Summary segmentation Raw Signal channel_id context_tags tracking_id I modified the config to use the new grp name “Basecall_1D_000”. so then I tried to re-run Nanopanel2 with the repaired config file, and I got a new error: ***@***.*** chrisr]$ singularity run --bind //data/chrisr/nanopanel2_data://data ./nanopanel2/nanopanel2_1.01.sif call --conf //data/config.json.CR2839 --out //data/nanopanel2_output Creating dir //data/nanopanel2_output/CR2839_nanopanel2/ Extracting FASTQ data for all: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:13<00:00, 2.15it/s] Error: file //data/nanopanel2_output/CR2839_nanopanel2/CR2839_nanopanel2.last.sam.tmp was not found! Exiting... ***@***.*** chrisr]$ The output folder contained these files: "S:\nanopanel2_data\nanopanel2_output\CR2839_nanopanel2\CR2839_nanopanel2.mm2.bam" "S:\nanopanel2_data\nanopanel2_output\CR2839_nanopanel2\CR2839_nanopanel2.mm2.bam.bai" "S:\nanopanel2_data\nanopanel2_output\CR2839_nanopanel2\CR2839_nanopanel2.ngms.bam" "S:\nanopanel2_data\nanopanel2_output\CR2839_nanopanel2\CR2839_nanopanel2.ngms.bam.bai" "S:\nanopanel2_data\nanopanel2_output\CR2839_nanopanel2\nanopanel2.log" "S:\nanopanel2_data\nanopanel2_output\CR2839_nanopanel2\CR2839_nanopanel2.fq.gz" "S:\nanopanel2_data\nanopanel2_output\CR2839_nanopanel2\CR2839_nanopanel2.idx" I have attached the config and the log files. (I am not sure if attachments work via an email reply) I see no evidence of output from “last”. Any ideas on what to check next? I have also attached my reference file (fasta format). Do I need to modify the reference or the “ROI_intervals” parameters to match names or lengths? My reference is on the small side (by typical ONT standards). In case I haven’t mentioned it yet, I am running the singularity container: "S:\nanopanel2\nanopanel2_1.01.sif" the sinteractive session has these resources: sinteractive --cpus-per-task=8 --mem=64g singularity run --bind //data/chrisr/nanopanel2_data://data ./nanopanel2/nanopanel2_1.01.sif call --conf //data/config.json.CR2839 --out //data/nanopanel2_output I appreciate your support in getting this to run. Chris From: Niko Popitsch ***@***.***> Sent: Tuesday, September 21, 2021 8:32 AM To: popitsch/nanopanel2 ***@***.***> Cc: Richie, Christopher (NIH/NIDA) [E] ***@***.***>; Author ***@***.***> Subject: Re: [popitsch/nanopanel2] basecall_grp? (#6) Hi Christopher Can you please run the following in a python 3.7 console: import h5py def print_h5_keys_recursive(a, pad=''): if hasattr(a, 'keys'): for k in list(a.keys()): print(pad, k) if k in a: print_h5_keys_recursive(a[k], pad+' ') def print_h5_keys(fast5_file): """ print an exemplary h5 key structure for the passed fast5 files """ f = h5py.File(fast5_file, 'r') first_read_name=next(iter(f.keys())) print_h5_keys_recursive(f[first_read_name]) print_h5_keys('myfile.fast5') # replace by one of your FAST5 filenames here Here is an example output from one of my FAST5 files: print_h5_keys('ACD011_pass_65241621_98.fast5') Analyses Basecall_1D_000 BaseCalled_template Fastq Summary basecall_1d_template Basecall_1D_001 BaseCalled_template Fastq Move Trace Summary basecall_1d_template Segmentation_000 Summary segmentation Segmentation_001 Summary segmentation Raw Signal channel_id context_tags tracking_id — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#6 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AR4COQ6YAJEOYYXLWOFHDL3UDB3NZANCNFSM5D6HP4UQ>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Answer 5 · 2021-09-22T08:54:56.000Z

Hi

re. your FAST5 structure: Np2 expects the following sections in the FAST5 file:
[read_id]["Analyses"][basecall_grp]["BaseCalled_template"]["Trace"] and
[read_id]["Analyses"][basecall_grp]["BaseCalled_template"]["Move"].
Np2 was developed and tested with guppy 3.6.1 and I haven't tested with newer guppy versions yet. With guppy 3.x versions, this sections were written when setting the '--trace_categories_logs Move' parameter (see README for an example commandline for guppy 3.6.1). Can you try re-basecalling your data with a similar commandline using your guppy version and re-run the script? There should then be a new basecall group which includes the Trace and Move sections.
re. the error: seems as if last does not work properly on your system. I'd need to see the log file to see what's going on.
Having said that, as our evaluation clearly showed that minimap2 is the best mapper for np2 data analysis, I'd recommend to disable last and ngm in the config (you can comment those sections out as shown in the example config file)...

Answer 6 · 2021-09-24T15:37:46.000Z

Hi Niko, I have had success re-basecalling to get more information in the Trace/Move. I will try the np2 again this weekend. thanks for your assistance so far. From: Niko Popitsch ***@***.***> Sent: Wednesday, September 22, 2021 4:55 AM To: popitsch/nanopanel2 ***@***.***> Cc: Richie, Christopher (NIH/NIDA) [E] ***@***.***>; Author ***@***.***> Subject: Re: [popitsch/nanopanel2] basecall_grp? (#6) Hi * re. your FAST5 structure: Np2 expects the following sections in the FAST5 file: [read_id]["Analyses"][basecall_grp]["BaseCalled_template"]["Trace"] and [read_id]["Analyses"][basecall_grp]["BaseCalled_template"]["Move"]. Np2 was developed and tested with guppy 3.6.1 and I haven't tested with newer guppy versions yet. With guppy 3.x versions, this sections were written when setting the '--trace_categories_logs Move' parameter (see README for an example commandline for guppy 3.6.1). Can you try re-basecalling your data with a similar commandline using your guppy version and re-run the script? There should then be a new basecall group which includes the Trace and Move sections. * re. the error: seems as if last does not work properly on your system. I'd need to see the log file to see what's going on. Having said that, as our evaluation clearly showed that minimap2 is the best mapper for np2 data analysis, I'd recommend to disable last and ngm in the config (you can comment those sections out as shown in the example config file)... — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#6 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AR4COQZSTPHZZVLOH4BCEPLUDGKWXANCNFSM5D6HP4UQ>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.