Segmentation fault when disk space is low
pinin4fjords opened this issue · 3 comments
Hi!
(The following is a reposting of marcelm/cutadapt#640, I'd thought the issue was coming from there, but @marcelm suggested that TrimGalore may be the proper place for a fix.)
When running cudadapt 3.4 on Python 3.9.6 via trim_galore 0.6.7 in a Docker container on AWS batch via the nf-core RNA-seq Nextflow workflow as follows:
trim_galore \
--fastqc \
--cores 4 \
--paired \
--gzip \
\
\
\
\
SRX8042381_1.fastq.gz \
SRX8042381_2.fastq.gz
... which reported Cutadapt command line parameters like:
-j 1 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC SRX8042381_1.fastq.gz
... I encountered an error like:
Writing final adapter and quality trimmed output to SRX8042381_1_trimmed.fq.gz
>>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file SRX8042381_1.fastq.gz <<<
sh: line 1: 296 Segmentation fault (core dumped) pigz -p 4 -c - > SRX8042381_1_trimmed.fq.gz
This turned out to be down to a lack of space available to the batch job, since using a single core in the above command produced an error like:
...
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
gzip: write error: No space left on device
If this has been addressed in more up-to-date versions of TrimGalore than used in that workflow then great- please disregard. If this is still something that would happen, could there maybe be some mechanism to catch the problem and provide a more intelligible error message than the seg fault?
Thank you!
Hi @pinin4fjords,
sorry this issue seems to have completely escaped me.... Having said that, I am not sure I will be able to offer help either to be perfectly honest. Trim Galore doesn't do any checks of file sizes or available disk sizes at all - and the issue here seems to be that pigz
caused the segmentation fault and core dump. Writing output to a pipe (here top pigz
) is notoriously difficult to debug, especially since errors like that often call the program to get killed.... I'd be happy to be told otherwise though...
If this was run via an nf-core pipeline, maybe they would have some functionality to check the log file for 'error' to produce a more helpful error message?
@FelixKrueger yes, it's a pigz thing, in that writing single-threaded triggers a more specific error. I was hoping (perhaps unrealistically) that a check for available space might be possible, but understand if it's a bit out of scope.
Hopefully pycompression/xopen#111 may have improved the situation somewhat.
Great, thanks for the pointer.