earthlab/rslurm

get_slurm_out

Closed this issue · 1 comments

mcuma commented

Hello,

thanks for creating this package which fills a void spot in R.

I have an issue/question with get_slurm_out(). It seems to submit another SLURM job to gather the results from slurm_apply jobs, but, since it does not accept other slurm_options, like slurm_apply, it in our case submits to a default partition/account, which I don't want to use (it may have long queue wait before execution).

My question is twofold (without looking at the source code yet):

  1. Why does get_slurm_out() even have to run another job, if the output from the slurm_apply jobs(s) is stored in users home (= which is mounted on all cluster nodes - I can't think of a cluster setup where this is not the case). I don't understand why would we need to run another job to retrieve these outputs. Guess the only possibility would be if user would submit SLURM jobs e.g. from their laptop that does not mount the cluster home directory, but, if that is the case the laptop would have to have access to the SLURM commands so it'd have to be tied to the cluster somewhat anyway (e.g. mount the file space where SLURM is installed).

  2. If the get_slurm_out() does need to run another SLURM job, then it'd be good to add slurm_options to it so that we can send that job with user modified parameters, e.g., in my case:
    sjob1 <- slurm_apply(ftest, pars, nodes=2, slurm_options = list(account="notchpeak-shared-short", partition="notchpeak-shared-short"))
    res <- get_slurm_out(sjob1, "table", slurm_options = list(account="notchpeak-shared-short", partition="notchpeak-shared-short"))
    while my default account="chpc" and partition="notchpeak".

Thanks,
MC

get_slurm_out() does not start a new SLURM job. There is an option (with ncores) to parallelize the execution of the function, but that is done locally (on the machine where get_slurm_out is called), not submitted to SLURM.