HenrikBengtsson/future.batchtools

Do I need a shared directory when using a batchtools_slurm plan?

yimyom opened this issue · 2 comments

Hi,

I'm sure I've read something about it somewhere but I can't find this information again. I've set up a small Slurm cluster with a few machines for testing purposes.
Then in R, I do:

library(future.batchtools)
plan(batchtools_slurm)
x <- future({1})
value(x)

And this does not work with the following error message:

Error: Log file '/home/david/.future/20200605_055425-LKTuAM/batchtools_319841923/logs/job558e266958d5c4f8d
7663b7cff9d3ca7.log' for job with id 1 not available

Now, a close inspection of what's happening under the hood tells me that sbatch (from Slurm) will run an sbatch script (preprocessed by brew, etc...) which ends up with the line:

Rscript -e 'batchtools::doJobCollection("/home/david/.future/20200605_055425-LKTuAM/batchtools_319841923/jobs/job558e266958d5c4f8d7663b7cff9d3ca7.rds")'

And that's where I'm a bit surprised: this script, including the last line, will be executed on a Slurm node. The R script inside (doJobCollection) makes a reference to a file on my master node home directory!

Does it mean I can run future on Slurm only if the home directory is shared?

EDIT:
it appears the answer is yes. As far as I can tell:

  1. sbatch (the command in Slurm to submit batch jobs) does not transfer any files except for the submitted script (this was clear from the man page),
  2. but nothing in batchtools and a fortiori in future.batchtools and future transfer the files. It's not really a problem though as soon as a shared directory is set up.

As far as I can tell, there is a need for a shared directory between the master node and the nodes. It doesn't need to be the home directory if the name of the registry directory is specified.

(Google, please help yourself and index my answer, as it was not obvious from the docs in batchtools)

Yes, the folder .future/ is used by {batchtools} to "communicate" (exports, results, logs, ...) with workers, so yes, that folder need to be accessible from the "launch" machine as well as all workers (=compute nodes).

The default location of this folder is the current working directory, so it depends where you run future.batchtools from. In your example, it looks like you were running from your home directory.

You can customize the location of this root folder by setting R option future.cache.path or environment variable R_FUTURE_CACHE_PATH so that it is on globally-shared file system, e.g.

options(future.cache.path = "/global/path/alice/.future")

Assuming this is resolved/answered, so closing