pod5 filter freeze
diego-rt opened this issue · 3 comments
Hello,
When running pod5 filter
I often get a freeze for the process in a random manner. Rerunning it usually results in successful completion. It is annoying because it does not lead to an exit code or something so the process just hangs until timeout.
This is the command:
pod5 filter ${pod5_dir} -t ${task.cpus} -r --ids filtered.channel_\${channel}.txt --missing-ok --output ./filtered.channel_\${channel}.pod5
This is the output:
Parsed 98 reads_ids from: filtered.channel_1381.txt
terminate called without an active exception
Thanks!
Hi @diego-rt ,
We're reworking subset
which is the underlying process used by filter
to significantly lower resources and improve performance. Also mentioned here: #93 (comment)
We'll hopefully get this out before year end.
However to help out in the meantime:
It looks like you're running in nextflow
based on the syntax of your command. I would recommend trying / exploring the following to which will hopefully improve reliability.
- Reduce
-t ${task.cpus}
- this only has a small effect infilter
and doesn't effect the runtime performance offilter
. - Increase memory allocated for the task.
- Use maxForks to limit the number of concurrent tasks.
- Reducing the number of parallel tasks might improve stability especially if there are large number of input files as there are potentially a very large number of open file descriptors during filtering / subsetting.
- errorStrategy: retry
- Retry failing jobs automatically
I hope these points help in the meantime and we'll get back to you soon with an update.
Kind regards,
Rich
Hi @HalfPhoton ,
Yes indeed I'm using nextflow with only one thread and 3G memory. I think the issue is that I've heavily parallelized it and have several hundred jobs simultaneously accessing the same file, which leads to some understandable I/O error. I should maybe reduce the number of forks, that's true.
But I think the main problem is the fact that the process hangs without exiting. It would be fine if it just died with an error exit code because it just would retry, but since it does not actually exit, then the process just sits there until timeout.
Yes you're absolutely correct and these changes will be incorporated to the new design of filter
and subset
which will be more stable for large numbers of Inputs / Outputs and scale better for use cases like your own.
Best regards,
Rich