mschubert/clustermq

SGE submission fails due to invalid parallel environment in default template

nickholway opened this issue · 7 comments

I'm trying to submit a trivial job on an Altair Grid Engine (formerly UGE & SGE) cluster. It fails because it's requesting a parallel environment incorrectly:

library(clustermq)
fx <- function(x) x * 2
Q(fx, x=1:6, n_jobs = 3)
Submitting 3 worker jobs (ID: cmq9197) ...
Unable to read script file because of error: ERROR! -pe option must have range as 2nd argument
[1] "#$ -N cmq9197\n#$ -j y\n#$ -o /dev/null\n#$ -cwd\n#$ -V\n#$ -t 1-3\n#$ -pe 1\n\nulimit -v $(( 1024 * 4096 ))\nCMQ_AUTH=dauia R --no-save --no-restore -e 'clustermq:::worker(\"tcp://REDACTED:9197\")'\n"
Error in (function (n_jobs, ..., log_worker = FALSE, verbose = TRUE)  : 
  Job submission failed with error code FALSE
In addition: Warning message:
In system2("qsub", input = filled, stdout = TRUE) :
  running command ''qsub' < '/tmp/Rtmp5qbIBg/file467e46f4be53'' had status 2

{U,S}GE parallel environments (PE) are requested with the option -pe <parallel environment name> <number of cores>, so it would be good to support a named PE in Q.

For single core jobs you don't need to specify a parallel environment, so I wonder if it'd be better to not have one specified by default in the template.

Session info:

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/prog/FlexiBLAS/3.0.4-GCC-11.2.0/lib64/libflexiblas.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] clustermq_0.8.95.3

loaded via a namespace (and not attached):
 [1] compiler_4.2.0   fastmap_1.1.0    R6_2.5.1         cli_3.3.0        htmltools_0.5.2  tools_4.2.0     
 [7] yaml_2.3.5       Rcpp_1.0.8.3     rmarkdown_2.14   codetools_0.2-18 knitr_1.39       xfun_0.31       
[13] digest_0.6.29    rlang_1.0.2      evaluate_0.15   

You can already specify your own keys in the template, analogous to environments.

I will likely provide a default key in a future release.

I'm running into this too, and I think we just need "smp" after "-pe" like in this line.

Opened a quick PR: #289. Works on my company's SGE cluster.

Is smp the default name of the parallel environment, or is this how it was called in your company?

I'm happy to merge if this is a general solution. For instance, does it solve the issue for you, @nickholway?

That's a great point, I haven't used SGE anywhere else, so I don't know.

It's smp at my employer too, so this should work for us.

I cannot for the life of me remember if "smp" is the default parallel environment for a single node in {S,U,A}GE though.

I'll merge, please reopen if this causes any issues down the line.