hasindu2008/slow5tools

memory issue

svennd opened this issue · 9 comments

I have a few runs that give this error : (converted from fast5 to blow5 then merged into a single file)

# slow5tools stats merged.blow5
file path       merged.blow5
file version    0.2.0
file format     BLOW5
record compression method       zlib
signal compression method       svb-zd
number of read groups   2
number of auxiliary fields      6
auxiliary fields        end_reason,channel_number,median_before,read_number,start_mux,start_time
[stats_main] counting number of slow5 records...
[slow5_get_next_mem::ERROR] Failed to allocate memory: Cannot allocate memory At src/slow5.c:3240
[stats_main::ERROR] Error reading the file.

The machine isn't under any memory pressure. (total 128 GB memory) and slow5tool doesn't seem to be using much memory either during the run. (it looks to be CPU bound) The merged.blow5 is about 387 GB large.

Is this a bug or somewhat expected with larger blow5 files ? I use the stats to get the "number of records" to compare with as a sanity check, if there is an other method i'm open to suggestions :)

Thanks!

Hi

Can i know the slow5tools version you are using?
Also was it the slow5tools cat command or merge command that you used?

Hey,

Yes sorry for omitting this information.
slow5tools 0.6.0

I do the following :

slowtool f2s run/ -d /run.tmp -p 32
slowtool merge run.tmp/ -o run/merged.blow5 -t 32

It might have to do with a local or run specific problem but i'm unsure how I could debug it, or find out what is the problem.

Thanks

This seems to be a very rare occurrence and is likely to be a bug. It is not have anything to do with memory as 128GB is more than enough, nor is the file size as I have used this version on a 3TB BLOW5 file recently.

Could you kindly run the following command:
slow5tools skim run/merged.blow5 and let me know if it prints anything at all before giving the same error.
If it prints anything let me know how many records it prints.

Also, could you do a slow5tools quickcheck run/merged.blow5 and see what happens?

Do you happen to have any log output during the merge command? If you can run the merge command again (it might waste some compute time, I believe) as slow5tools --verbose 10 merge run.tmp/ -o run/merged.blow5 -t 32 2> merge.log and send me the merge.log, I can have a look if something weird is going causing any file corruption.

Another check you can do is to see if all the individual outputs produced by f2s are intact. For that please run the following BASH loop.

for file in  run.tmp/*.blow5; do slow5tools stats $file || { echo "Error in $file"; break;}; done

Hey hasindu2008,

When checking the log of the conversion, I noticed this experiment has 2 flowcells. Perhaps this is the source of the issue ?

[list_all_items] Looking for '*.fast5' files in /
[list_all_items] Looking for '*.fast5' files in /ori
[list_all_items] Looking for '*.fast5' files in /ori/run1
[list_all_items] Looking for '*.fast5' files in /ori/run1/no_sample
[list_all_items] Looking for '*.fast5' files in /ori/run1/no_sample/20220615_1551_1C_PAJ06720_7c321db0
[list_all_items] Looking for '*.fast5' files in /ori/run1/no_sample/20220615_1551_1C_PAJ06720_7c321db0/other_reports
[list_all_items] Looking for '*.fast5' files in /ori/run1/no_sample/20220615_1551_1C_PAJ06720_7c321db0/fast5_fail
[list_all_items] Looking for '*.fast5' files in /ori/run1/no_sample/20220615_1551_1C_PAJ06720_7c321db0/fast5_pass
[list_all_items] Looking for '*.fast5' files in /ori/run1/no_sample/20220615_1551_1C_PAJ06720_7c321db0/fastq_fail
[list_all_items] Looking for '*.fast5' files in /ori/run1/no_sample/20220615_1551_1C_PAJ06720_7c321db0/log_jobs
[list_all_items] Looking for '*.fast5' files in /ori/run1/no_sample/20220615_1551_1C_PAJ06720_7c321db0/fastq_pass
[list_all_items] Looking for '*.fast5' files in /ori/run2
[list_all_items] Looking for '*.fast5' files in /ori/run2/no_sample
[list_all_items] Looking for '*.fast5' files in /ori/run2/no_sample/20220614_1419_1C_PAJ06720_51e687e7
[list_all_items] Looking for '*.fast5' files in /ori/run2/no_sample/20220614_1419_1C_PAJ06720_51e687e7/fast5_fail
[list_all_items] Looking for '*.fast5' files in /ori/run2/no_sample/20220614_1419_1C_PAJ06720_51e687e7/fast5_pass
[list_all_items] Looking for '*.fast5' files in /ori/run2/no_sample/20220614_1419_1C_PAJ06720_51e687e7/log_jobs
[list_all_items] Looking for '*.fast5' files in /ori/run2/no_sample/20220614_1419_1C_PAJ06720_51e687e7/other_reports
[list_all_items] Looking for '*.fast5' files in /ori/run2/no_sample/20220614_1419_1C_PAJ06720_51e687e7/fastq_fail
[list_all_items] Looking for '*.fast5' files in /ori/run2/no_sample/20220614_1419_1C_PAJ06720_51e687e7/fastq_pass
[f2s_main] 1481 fast5 files found - took 0.009s
[f2s_main] Just before forking, peak RAM = 0.000 GB
[f2s_iop] 32 proceses will be used.
[f2s_iop] Spawning 32 I/O processes to circumvent HDF hell.
[f2s_child_worker::INFO] Summary - total fast5: 47, bad fast5: 0
[f2s_child_worker::INFO] Summary - total fast5: 24, bad fast5: 0
[f2s_child_worker::INFO] Summary - total fast5: 47, bad fast5: 0
-- removed duplicates --
[f2s_child_worker::INFO] Summary - total fast5: 47, bad fast5: 0
[f2s_main] Converting 1481 fast5 files took 2538.335s
[f2s_main] Children processes: CPU time = 65461.502 sec | peak RAM = 0.254 GB

[main] cmd: /root/slow5tools/slow5tools-v0.6.0/slow5tools f2s / -d /tape/incoming/run2.tmp/ -p 32
[main] real time = 2538.347 sec | CPU time = 0.019 sec | peak RAM = 0.003 GB
[list_all_items] Looking for '*.slow5' files in /tape/incoming/run2.tmp/
[merge_main] 1481 files found - took 0.003s

[merge_main] Allocating new read group numbers - took 2.034s

the skim command returns a huge amount of lines, i'm now running | wc -l

The quickcheck gives an error :
[quickcheck_main::ERROR] No valid slow5 eof marker at the end of the BLOW5 file.

Ill rerun the conversion & merge manual and keep the logs. Thanks for the help!

It seems like somehow your merge job got prematurely ended. See if it crashed midway?

@svennd having flowcells cannot be an issue. Recently I combined like 50 flowcells.
Looking at what quickcheck says it is very much like a premature ending and the EOF marker never had a chance to get written. See what happens when you run merge again and let me know.

I'm rerunning this now, but seeing the storage makes me think that this might have been a "full disk" error. The merge is running now.

Yes, now the merge worked w/o a problem (and stat). This fault most likely had to do with a full disk, so the file write was stopped in the middle. I feel like the issue is resolved, so ill close it.

Thanks for the help and thank you for developing this amazing toolbox!