pod5:Empty queue or timeout
Closed this issue · 20 comments
Hi @HITzhongyu ,
Could you set the POD5_DEBUG=1
environment variable and run the same command again? The converter will now generate a number of log files which show the state of the Queue at runtime. I can use these to help resolve this issue.
POD5_DEBUG=1 pod5 convert fast5 ./fast5/*.fast5 --output debug_pod5/
Kind regards,
Rich
Hi @HalfPhoton
I changed to a new set of test data and reran the command. This time, I encountered an error right from the beginning, as follows:
However, the program can still run normally. But it gets stuck at 99% and throws the following error:
and I try to run POD5_DEBUG=1 pod5 convert fast5 ./test/*.fast5 --output debug_pod5/
, the errors are as following:
Kind regards,
Zhongyu
The first report shows you using-t/--threads 40
which is giving a different error to the second report. You might be requesting too many resources which is why the tool is failing to create a new process or thread resulting in resource temporarily unavailable
. I would suggest reducing the value given to --threads
For the second report which is related to the original issue raised; there should be .log
files created now that POD5_DEBUG=1
is set . Can you share those with me please?
It looks like the Queue that contains the conversion tasks is becoming empty
somehow or timing out after 600 seconds for a single conversion task (which should be plenty of time for a small chunk of).
The log files will help me track down why this happens. Either the process is getting stuck or the queue logic is failing in your example.
Kind regards,
Rich
@HalfPhoton
Here are all the log files, thanks !
2023-06-27--20-01-54-p-11518-pod5.log
2023-06-27--20-01-54-p-11519-pod5.log
2023-06-27--20-01-54-p-11520-pod5.log
2023-06-27--20-01-54-p-11511-pod5.log
2023-06-27--20-01-54-p-11512-pod5.log
2023-06-27--20-01-54-p-11513-pod5.log
2023-06-27--20-01-54-p-11514-pod5.log
2023-06-27--20-01-54-p-11517-pod5.log
2023-06-27--20-01-52-main-pod5.log
Kind regards,
Zhongyu
Hi @HITzhongyu
Thank you very much for the logs. They've been very helpful.
From the main-pod5.log
we can see that one of the worker processes has been killed from a segmentation fault
2023-06-27 20:21:44,357 DEBUG 66:'terminate_processes': ... SpawnProcess-11, stopped[SIGSEGV] daemon ...
and in the worker 11513-pod5.log
we see that the log ends abruptly here:
--- Finishing previous file FAQ32498_pass_09083b73_65.fast5
2023-06-27 20:11:05,414 DEBUG 53:'convert_fast5_file':Done:37.193s
2023-06-27 20:11:05,425 DEBUG 53:'convert_fast5_file':Returned:4000
2023-06-27 20:11:05,427 INFO Enqueueing file end: FAQ32498_pass_09083b73_65.fast5 reads: 4000
2023-06-27 20:11:05,428 DEBUG c7:'enqueue_data'
--- Getting next file FAQ32498_pass_09083b73_71.fast5
2023-06-27 20:11:05,430 DEBUG 56:'get_input':(<pod5.tools.pod5_convert_from_fast5.QueueManager object at 0x7f8b6c3b5b10>,), {}
2023-06-27 20:11:05,430 DEBUG 56:'get_input':Done:0.000s
2023-06-27 20:11:05,430 DEBUG 56:'get_input':Returned:test/FAQ32498_pass_09083b73_71.fast5
--- Testing is_multi_read_fast5 on FAQ32498_pass_09083b73_71.fast5
2023-06-27 20:11:05,431 DEBUG 72:'is_multi_read_fast5':(PosixPath('test/FAQ32498_pass_09083b73_71.fast5'),), {}
--- Segfault
We'd expect to see
2023-06-27 20:10:26,479 DEBUG fd:'is_multi_read_fast5':(PosixPath('test/FAQ32498_pass_09083b73_65.fast5'),), {}
2023-06-27 20:10:28,220 DEBUG fd:'is_multi_read_fast5':Done:1.741s
2023-06-27 20:10:28,220 DEBUG fd:'is_multi_read_fast5':Returned:True
Can you please try and check that this file test/FAQ32498_pass_09083b73_71.fast5
is not corrupt in some way?
Kind regards,
Rich
Hi @HalfPhoton
I try to open test/FAQ32498_pass_09083b73_71.fast5
,but HDFView can't open it
I can upload the file to you,and you can test it
Kind regards,
Zhongyu
@HITzhongyu ,
Can you open it with python?
Using the same environment where pod5 is installed these module imports should exists:
# Get the a path to the file
from pathlib import Path
path = Path("test/FAQ32498_pass_09083b73_71.fast5")
assert path.exists()
# Can we open the file with h5py? If it fails here then the HDF5 file is corrupted somehow
import h5py
h5 = h5py.File(path)
# Is the file empty? If it fails here there's nothing to do anyway and the file should be deleted
assert len(h5) > 0
# Can pod5 check the file? If it fails here then there might be something we can do
from pod5.tools.pod5_convert_from_fast5 import is_multi_read_fast5
is_multi_read_fast5(pp)
@HalfPhoton
it report an error : Segmentation fault (core dumped)
Can you add a few print statements between tests or run it line-by-line in an interpreter to determine where the segfault occurs?
@HalfPhoton
sure!
from pathlib import Path
path = Path("/home/user/ydliu/hitbic/HG002/test/FAQ32498_pass_09083b73_71.fast5")
assert path.exists()
print("666")
import h5py
h5 = h5py.File(path)
print("777")
assert len(h5) > 0
print("888")
from pod5.tools.pod5_convert_from_fast5 import is_multi_read_fast5
print(is_multi_read_fast5(path))
Ok,
Please try this:
print("start")
with h5py.File(path) as _h5:
print("open")
print(_h5)
_h5.attrs
print("can access_h5.attrs")
print(_h5.attrs)
# The "file_type" attribute might be present on supported multi-read fast5 files.
if _h5.attrs.get("file_type") == "multi-read":
return True
print( "is not multi-read file type")
if len(_h5) == 0:
return True
print( "is not len 0")
# if there are "read_x" keys, this is a multi-read file
if any(key for key in _h5 if key.startswith("read_")):
print("found a read")
return True
print("closing handle")
print("everything is fine?!")
I modify your code,because it cause some error
print("start")
with h5py.File(path) as _h5:
print("open")
print(_h5)
_h5.attrs
print("can access_h5.attrs")
print(_h5.attrs)
# The "file_type" attribute might be present on supported multi-read fast5 files.
if _h5.attrs.get("file_type") == "multi-read":
print("True")
# return True
print( "is not multi-read file type")
if len(_h5) == 0:
print("True")
# return True
print( "is not len 0")
# if there are "read_x" keys, this is a multi-read file
if any(key for key in _h5 if key.startswith("read_")):
print("found a read")
# return True
print("closing handle")
print("everything is fine?!")
It reports an error:
start
open
<HDF5 file "FAQ32498_pass_09083b73_71.fast5" (mode r)>
can access_h5.attrs
<Attributes of HDF5 object at 139974581599904>
is not multi-read file type
is not len 0
Traceback (most recent call last):
File "test.py", line 40, in <module>
if any(key for key in _h5 if key.startswith("read_")):
File "test.py", line 40, in <genexpr>
if any(key for key in _h5 if key.startswith("read_")):
File "/home/user/ydliu/miniconda3/envs/remora/lib/python3.8/site-packages/h5py/_hl/group.py", line 499, in __iter__
for x in self.id.__iter__():
File "h5py/h5g.pyx", line 128, in h5py.h5g.GroupIter.__next__
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5l.pyx", line 316, in h5py.h5l.LinkProxy.iterate
RuntimeError: Link iteration failed (incorrect metadata checksum after all read attempts)
Hi @HITzhongyu ,
It does appear that your fast5 file is corrupt. This is the same issue as seen here: megalodon#279
I'm not sure what we can do other than to recommend that you check your files, and drop those that are corrupt before continuing with pod5 convert
[Edit: subset -> convert
]
Apologies we don't have better solution.
Kind regards,
Rich
Hi @HalfPhoton
Thank you very much for your patient explanation.
I have another question. If the Fast5 data is corrupted, why is there no issue with it during Guppy processing, but problems arise specifically with pod5?
Regarding this issue, can you perform a filtering step before converting with pod5, skipping any damaged Fast5 files that are recognized as single Fast5, without affecting the subsequent program execution? If there are only a few such damaged data points, it should not impact the results of large-scale methylation detection.
or if it's convenient for you, could you please let me know which part of the code needs to be modified? I can make the changes on my end.
Kind regards,
Zhongyu
pod5 convert
will try to ignore bad fast5 unless --strict
is set. We removed the up-front fast5 checking because it was so slow.
In your case, the files are causing a prompt segfault which kills the worker process immediately instead of allowing it to handle the error gracefully. This is an issue with h5py
.
There potential changes we can make to how we handle dead workers which we might investigate.
As for how Guppy can handle this file when pod5 cannot; I'm not sure, but Guppy is not using python / h5py
which is where I believe the issue is caused.
Kind regards,
Rich
Edit: subset -> convert
Hi @HalfPhoton
I find pod5 subset
to check pod5 not fast5
usage: pod5 subset [-h] [-o OUTPUT] [-r] [-f] [-t THREADS] [--csv CSV]
[-s TABLE] [-R READ_ID_COLUMN] [-c COLUMNS [COLUMNS ...]]
[--template TEMPLATE] [-T] [-M] [-D]
inputs [inputs ...]
Given one or more pod5 input files, take subsets of reads into one or more pod5 output files by a user-supplied mapping.
positional arguments:
inputs Pod5 filepaths to use as inputs
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Destination directory to write outputs (default:
/home/user/ydliu/hitbic/HG002)
-r, --recursive Search for input files recursively matching `*.pod5`
(default: False)
-f, --force-overwrite
Overwrite destination files (default: False)
-t THREADS, --threads THREADS
Number of subsetting workers (default: 8)
direct mapping:
--csv CSV CSV file mapping output filename to read ids (default:
None)
table mapping:
-s TABLE, --summary TABLE, --table TABLE
Table filepath (csv or tsv) (default: None)
-R READ_ID_COLUMN, --read-id-column READ_ID_COLUMN
Name of the read_id column in the summary (default:
read_id)
-c COLUMNS [COLUMNS ...], --columns COLUMNS [COLUMNS ...]
Names of --summary / --table columns to subset on
(default: None)
--template TEMPLATE template string to generate output filenames (e.g.
"mux-{mux}_barcode-{barcode}.pod5"). default is to
concatenate all columns to values as shown in the
example. (default: None)
-T, --ignore-incomplete-template
Suppress the exception raised if the --template string
does not contain every --columns key (default: None)
content settings:
-M, --missing-ok Allow missing read_ids (default: False)
-D, --duplicate-ok Allow duplicate read_ids (default: False)
Example: pod5 subset inputs.pod5 --output subset_mux/ --summary summary.tsv --columns mux```
Sorry, my error. I meant to say pod5 convert
not pod5 subset
when explaining the --strict
option above.
Are you happy with the solution @HITzhongyu , can we close this issue?