chklovski/CheckM2

Random errors on clusters

yexianingyue opened this issue · 3 comments

Hello,
When I submit the checkm2 command on the cluster using sbatch, I encounter different errors each time, such as ERROR: Make sure prodigal is on your system path. or ERROR: File does not exist: input_5751.gff.4. Sometimes I also receive the error sh: fork: retry: No child processes. However, when I log into a compute node and run the command directly, I do not experience any issues.

Another issue is that when I specify the number of threads, it seems that checkm2 generates a number of sub-threads that exceed this value. Why is this happening?

Sometimes the software gets stuck at Calling genes in 1000 bins with 32 threads:, and then when I try to input protein sequences, it gets stuck at Calculating metadata for 10000 bins with 36 threads:.

Many thanks for your time and your help,


Here is some additional information.

checkm2 command:

source activate checkm2
checkm2 predict -x fa -t 32 --tmpdir ckm2.tmp  --input bin_dir --output-directory ckm2 --remove_intermediates --force

node info:

$ lscpu

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                36
On-line CPU(s) list:   0-35
Thread(s) per core:    1
Core(s) per socket:    18
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz
Stepping:              7
CPU MHz:               1199.963
CPU max MHz:           4000.0000
CPU min MHz:           1200.0000
BogoMIPS:              6200.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              25344K
NUMA node0 CPU(s):     0-17
NUMA node1 CPU(s):     18-35
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni spec_ctrl intel_stibp flush_l1d arch_capabilities

ps xf

113838 ?        S      0:00 /bin/bash /var/spool/slurm/d/job407430/slurm_script
113906 ?        Sl     0:20  \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114068 ?        Sl     0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114073 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114079 ?        Sl     0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114086 ?        Sl     0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114094 ?        Sl     0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114103 ?        Sl     0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114113 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114124 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114130 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114137 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114144 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114156 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114166 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114175 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114185 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114193 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114202 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114211 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114220 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114224 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114238 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114243 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114256 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114266 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114276 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114290 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114307 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114320 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114334 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114345 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114355 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114368 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114378 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114391 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114401 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114414 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114425 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114430 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114432 ?        S      0:00      \_ python checkm2 predict -x fa -t 32 --tmpdir ckm2.000000.tmp --input bin_dir.000000 --output-directory ckm2.000000 --remove_intermediates --force
114440 ?        Z      0:00      \_ [checkm2] <defunct>

checkm2.log

[03/07/2024 02:22:25 PM] INFO: Running CheckM2 version 1.0.1
[03/07/2024 02:22:25 PM] INFO: Running quality prediction workflow with 32 threads.
[03/07/2024 02:23:07 PM] INFO: Calling genes in 1000 bins with 32 threads:
[03/07/2024 02:23:07 PM] ERROR: Make sure prodigal is on your system path.
[03/07/2024 02:23:07 PM] ERROR: Make sure prodigal is on your system path.

Interestingly, although it indicated that prodigal could not be found, a portion of the proteins was successfully predicted, and then the program froze.

Hi,

Does checkm2 testrun complete successfully? Was this installed via conda? It sounds like perhaps file permission issues might be playing a role, but it's hard to say with such generic errors.

Please feel free to reopen if you have further information or error persists.