Unusual memory fail issues on HPC

Question

Unusual memory fail issues on HPC

Closed this issue a year ago · 0 comments

Hi,
Currently trying to run a plant genome with nanopore reads in a node with 1tb of ram and my runs keep on getting killed during the " Starting 'ovS' concurrent execution" of the correction stage with an error from slurm "Some of your processes may have been killed by the cgroup out-of-memory handler"

Is there a way I can troubleshoot this to identify a way to lower the ram requirements as Its unlikely I will have access to additional ram.
The log reports the following allocations before starting canu:

-- Local: meryl     24.000 GB    8 CPUs x  16 jobs   384.000 GB 128 CPUs  (k-mer counting)
-- Local: hap       12.000 GB   16 CPUs x   8 jobs    96.000 GB 128 CPUs  (read-to-haplotype assignment)
-- Local: cormhap   32.000 GB   16 CPUs x   8 jobs   256.000 GB 128 CPUs  (overlap detection with mhap)
-- Local: obtovl    16.000 GB   16 CPUs x   8 jobs   128.000 GB 128 CPUs  (overlap detection)
-- Local: utgovl    16.000 GB   16 CPUs x   8 jobs   128.000 GB 128 CPUs  (overlap detection)
-- Local: cor        -.--- GB    4 CPUs x   - jobs     -.--- GB   - CPUs  (read correction)
-- Local: ovb        4.000 GB    1 CPU  x 128 jobs   512.000 GB 128 CPUs  (overlap store bucketizer)
-- Local: ovs       16.000 GB    1 CPU  x 128 jobs  2048.000 GB 128 CPUs  (overlap store sorting)
-- Local: red       32.000 GB    8 CPUs x  16 jobs   512.000 GB 128 CPUs  (read error detection)
-- Local: oea        8.000 GB    1 CPU  x 128 jobs  1024.000 GB 128 CPUs  (overlap error adjustment)
-- Local: bat      256.000 GB   16 CPUs x   1 job    256.000 GB  16 CPUs  (contig construction with bogart)
-- Local: cns        -.--- GB    8 CPUs x   - jobs     -.--- GB   - CPUs  (consensus)

Thanks.