ribodetector_cpu hangs with SLURM
gianfilippo opened this issue · 4 comments
Hi,
I tried your package on an interactive SLURM session, and it worked.
I then tried to submit it as a job via SLURM and it hangs at
2023-03-09 16:13:36 : INFO Using high MCC model file: /home/conda_envs/ribodetector/lib/python3.9/site-packages/ribodetector/data/ribodetector_600k_variable_len70_101_epoch47.onnx on CPU
I already tried to reinstall and nothing changes.
The command I issued in both sessions is
ribodetector_cpu -t 8 -l 92 -i $FASTQ1.fq.gz $FASTQ1.fq.gz -e rrna -o $outFASTQ1.nonrrna.1.fq $outFASTQ2.nonrrna.2.fq
What can I do ?
Thanks
Could you post your SLURM script or command used to submit the job? You need to specify --cpus-per-task to the number you CPU cores you need and set --threads-per-core to 1.
I'm running into the same issue here. I submit it with sbatch
, and it runs within a singularity container from here.
At the start there are two active processes on the node, and after 5 mins, there's nothing going on anymore..
This is my script:
#!/usr/bin/env bash
#SBATCH --time=1-00:00:00
#SBATCH --mem-per-cpu=4G
#SBATCH --cpus-per-task=12
#SBATCH --threads-per-core=1
cd /workdir
MEAN_READ_LENGTH=`zcat results/fastp/MP_35_R1_trimmed.fastq.gz | head -1000 | awk '{if(NR%4==2) {count++; bases += length} } END {print int(bases/count)}' || true`
echo "Estimated read length: $MEAN_READ_LENGTH"
singularity exec containers/ribodetector_0.2.7-cpu.sif \
ribodetector_cpu \
--len "$MEAN_READ_LENGTH" \
--threads "$SLURM_CPUS_PER_TASK" \
--input results/fastp/MP_35_R1_trimmed.fastq.gz results/fastp/MP_35_R2_trimmed.fastq.gz \
--output results/ribodetector/MP_35_R1.fastq.gz results/ribodetector/MP_35_R2.fastq.gz \
--rrna results/ribodetector/MP_35_R1_rrna.fastq.gz results/ribodetector/MP_35_R2_rrna.fastq.gz \
--ensure rrna
It works now. The issue was not setting --chunk_size
which led to memory issues.
RTFM.....
It works now. The issue was not setting
--chunk_size
which led to memory issues.RTFM.....
It is great that you figured out the solution. This will be beneficial to other users. Will incorporate this into the FAQ in README.