slow5tools degrade (1.3.0) does not detect ULK kit?
Opened this issue · 5 comments
For the following s/blow5 header made with blue-crab (0.1.2) , it does not seem that slow5tools degrade (1.3.0) recognizes the ULK kit.
#slow5_version 0.2.0
#num_read_groups 1
@acquisition_id ca82937006c473b34e065122cf6a8ed73c55ce18
@acquisition_start_time 2024-06-26 09:25:49.033000+00:00
@adc_max 2047
@adc_min 0
@asic_id FFFFFC0FE73734C0
@asic_id_eeprom FFFFFC0FE73734C0
@asic_temp 28.228447
@asic_version Unknown
@barcoding_enabled 0
@basecall_config_filename dna_r10.4.1_e8.2_400bps_5khz_modbases_5hmc_5mc_cg_hac_prom.cfg
@configuration_version 5.9.18
@data_source real_device
@device_id A
@device_type p2_solo
@distribution_status stable
@distribution_version 24.02.16
@exp_script_name sequencing/sequencing_PRO114_DNA_e8_2_400K_long_read:FLO-PRO114M:SQK-ULK114:400
@exp_script_purpose sequencing_run
@exp_start_time 2024-06-26T11:25:49.033544+02:00
@experiment_name Blood-WGS_ONT_24062024
@experiment_type genomic_dna
@flow_cell_id PAU99561
@flow_cell_product_code FLO-PRO114M
@fpga_board_id 0018f5206e51685c
@fpga_firmware_version 2.1.0
@guppy_version 7.3.11+0112dde09
@heatsink_temp 34.045727
@host_product_code GRD-MK1
@host_product_serial_number GXB04189
@hostname GXB04189
@installation_type nc
@is_simulated 0
@local_basecalling 1
@operating_system ubuntu 20.04
@package bream4
@package_version 7.9.8
@protocol_group_id Blood-WGS_ONT_24062024
@protocol_name sequencing/sequencing_PRO114_DNA_e8_2_400K_long_read:FLO-PRO114M:SQK-ULK114:400
@protocol_run_id d2c3e09e-da67-4bba-aecf-0c004874a607
@protocol_start_time 2024-06-26T11:24:03.627544+02:00
@protocols_version 7.9.8
@run_id ca82937006c473b34e065122cf6a8ed73c55ce18
@sample_frequency 5000
@sample_id Blood-WGS_L3_26062024
@sample_rate 5000
@selected_speed_bases_per_second 400
@sequencer_hardware_revision HW-30
@sequencer_position P2S-00581-A
@sequencer_position_type PromethION
@sequencer_product_code PRO-SEQ002
@sequencer_serial_number P2S-00581
@sequencing_kit sqk-ulk114
@software MinKNOW 24.02.16 (Bream 7.9.8, Core 5.9.12, Dorado 7.3.11+0112dde09)
@system_name GXB04189
@system_type GridION Mk1
@usb_config fx3_0.0.0#fpga_0.0.0#unknown#unknown
@usb_firmware_version 2.5.1
@version 5.9.12
~/bin/slow5tools-v1.3.0/slow5tools degrade -s ex-zd -c zstd PAU99561_d2c3e09e_ca829370_21.blow5 -o PAU99561_d2c3e09e_ca829370_21.3.blow5
[degrade_main::WARNING] This tool performs lossy compression which is an irreversible operation. Just making sure it is intended.
[slow5_hdr_get_dataset] Not detected: MinION DNA lsk114 5kHz
[slow5_hdr_get_dataset] Not detected: PromethION DNA lsk109 4kHz
[slow5_hdr_get_dataset] Not detected: PromethION DNA lsk114 4kHz
[slow5_hdr_get_dataset] Not detected: PromethION DNA lsk114 5kHz
[slow5_hdr_get_dataset] Not detected: PromethION RNA rna002 3kHz
[slow5_hdr_get_dataset] Not detected: PromethION RNA rna004 4kHz
[slow5_hdr_get_dataset::ERROR] No suitable bits suggestion
[degrade_main::ERROR] Use option -b to manually specify
~/bin/slow5tools-v1.3.0/slow5tools degrade -s ex-zd -c zstd PAU99561_d2c3e09e_ca829370_21.blow5 -o PAU99561_d2c3e09e_ca829370_21.3.blow5 -b4
[degrade_main::WARNING] This tool performs lossy compression which is an irreversible operation. Just making sure it is intended.
[slow5_encode_signal_press::WARNING] Signal compression method ex-zd is new. While it is stable, just keep an eye. At src/slow5_press.c:116
[main] cmd: /home/jelber43/bin/slow5tools-v1.3.0/slow5tools degrade -s ex-zd -c zstd PAU99561_d2c3e09e_ca829370_21.blow5 -o PAU99561_d2c3e09e_ca829370_21.3.blow5 -b4
[main] real time = 40.577 sec | CPU time = 117.731 sec | peak RAM = 3.700 GB
I guess if it is possible to parse the ULK part, then that would be fine or to show the user what bit values to use for different datasets?
Hello, we are parsing the ulk part properly, but it is checking if the kits match the ones we exhaustively tested. As this is a lossy compression, we are being very pedantic to avoid a user from inadvertently getting their data affected. These kits will be eventually added when we come across them and test. I have not had access to GridION sqk-ulk114 data, but is very likely the suitable -b would be 3. Is this a publicly available dataset?
As per the Twitter conversation (https://x.com/jpelbers/status/1842484817885073502), here is a Dropbox link to ~30x average coverage ONT ULK reads for HG002 chr22 (based on alignment to hg38 no alts). They were HG002 cells with DNA extracted following a BioNano DNA extraction protocol, undergoing ONT ULK library preparation, then sequenced on an ONT PromethION P2 solo device with an r10.4.1 flowcell connected to a ONT GridION for data acquisition. Provided is an ex-zd, zstd blow5 file that you can access with
wget 'https://www.dropbox.com/scl/fi/8s0p4ttpuy1amiuulzu3v/WGS_HG002_Bionano_recover_13022024.chr22.readids.blow5?rlkey=395acerl9ewgyqkafi7g15ipe&st=giubcawn' -O WGS_HG002_Bionano_recover_13022024.chr22.blow5
on a computer with wget.
Best,
Jean Elbers
*NOTE that the blow5 file on Dropbox does not match the header above in this Github issue as I realized those squiggles did not belong to HG002.
Thanks, we will have a look at this as soon as possible.
OK, @KavinduJayas did the tests and 3-bits seems to be the suitable number of bits for removal.
Identity scores:
plot_WGS_HG002_Bionano_recover_13022024_rounded_1_vs_original_sup.pdf
plot_WGS_HG002_Bionano_recover_13022024_rounded_2_vs_original_sup.pdf
plot_WGS_HG002_Bionano_recover_13022024_rounded_3_vs_original_sup.pdf
plot_WGS_HG002_Bionano_recover_13022024_rounded_4_vs_original_sup.pdf
Methylation correlation:
WGS_HG002_Bionano_recover_13022024.chr22_rounded_1_bi_vs_remora.pdf
WGS_HG002_Bionano_recover_13022024.chr22_rounded_2_bi_vs_remora.pdf
WGS_HG002_Bionano_recover_13022024.chr22_rounded_3_bi_vs_remora.pdf
WGS_HG002_Bionano_recover_13022024.chr22_rounded_4_bi_vs_remora.pdf
@sashajenner could you please implement a profile for this data in the dev branch for degrade please? The relevant header data is as follows:
#slow5_version 0.2.0
#num_read_groups 1
@acquisition_id 014da3cd8f6521012f0430299be6ee90c8be10c8
@acquisition_start_time 2024-02-13 11:11:37.722000+00:00
@adc_max 2047
@adc_min 0
@asic_id 0004A30B01138266
@asic_id_eeprom 0004A30B01138266
@asic_temp 27.578566
@asic_version Unknown
@barcoding_enabled 0
@basecall_config_filename dna_r10.4.1_e8.2_400bps_5khz_hac_prom.cfg
@configuration_version 5.8.6
@data_source real_device
@device_id B
@device_type p2_solo
@distribution_status stable
@distribution_version 23.11.7
@exp_script_name sequencing/sequencing_PRO114_DNA_e8_2_400K:FLO-PRO114M:SQK-ULK114:400
@exp_script_purpose sequencing_run
@exp_start_time 2024-02-13T11:11:37.722531+00:00
@experiment_name WGS_HG002_Bionano_recover_13022024
@experiment_type genomic_dna
@flow_cell_id PAU64142
@flow_cell_product_code FLO-PRO114M
@fpga_board_id 0018f5206e51685c
@fpga_firmware_version 2.1.0
@guppy_version 7.2.13+fba8e8925
@heatsink_temp 33.988201
@host_product_code GRD-MK1
@host_product_serial_number GXB04189
@hostname GXB04189
@installation_type nc
@is_simulated 0
@local_basecalling 1
@operating_system ubuntu 20.04
@package bream4
@package_version 7.8.2
@protocol_group_id WGS_HG002_Bionano_recover_13022024
@protocol_name sequencing/sequencing_PRO114_DNA_e8_2_400K:FLO-PRO114M:SQK-ULK114:400
@protocol_run_id fd789ccc-282f-4e00-8532-909719d345b8
@protocol_start_time 2024-02-13T11:09:55.746097+00:00
@protocols_version 7.8.2
@run_id 014da3cd8f6521012f0430299be6ee90c8be10c8
@sample_frequency 5000
@sample_id WGS_HG002_Bionano_recover
@sample_rate 5000
@selected_speed_bases_per_second 400
@sequencer_hardware_revision HW-30
@sequencer_position P2S-00581-B
@sequencer_position_type PromethION
@sequencer_product_code PRO-SEQ002
@sequencer_serial_number P2S-00581
@sequencing_kit sqk-ulk114
@software MinKNOW 23.11.7 (Bream 7.8.2, Core 5.8.6, Dorado 7.2.13+fba8e8925)
@system_name GXB04189
@system_type GridION Mk1
@usb_config fx3_0.0.0#fpga_0.0.0#unknown#unknown
@version 5.8.6
Since this kit can be used on different device types other than the PromethION 2 Solo, should we be ignoring the device_type header field? Or does the device affect the ideal number of bits to remove?