HenrikBengtsson/future.batchtools

Problem forwarding batchtools resources to individual futures

wlandau opened this issue · 4 comments

Summary

When I supply a resources list to a batchtools future, the resources seem to be ignored. I was alerted to this via targets, ropensci/targets#562 (comment) and ropensci/targets#632 (cc @wresch, @sdechaumet)

Reproducible example

Using an SGE cluster with this template:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -o <%= log.file %>
#$ -V
#$ -N <%= job.name %>
#$ -pe smp <%= resources$slots %>
module load R/4.0.3
Rscript -e 'batchtools::doJobCollection("<%= uri %>")'
exit 0

R console:

library(future)
library(future.batchtools)
plan(batchtools_sge, template = "sge_batchtools.tmpl")
f <- future(Sys.sleep(5), resources = list(slots = 2))
# Error: Fatal error occurred: 101. Command 'qsub' produced exit code 2. Output: 'Unable to read script file because of error: ERROR! -pe option must have range as 2nd argument'

When I run debug(BatchtoolsFuture) on this example, I see that resources = list(slots = 2) is not actually passed to the BatchtoolsFuture() function. BatchtoolsFuture() appears to receive the default resources = list().

Expected behavior

An SGE job with 2 slots should be submitted.

Session information

I am using the following commits of future and future.batchtools:

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS/LAPACK: <CENSORED>

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] future.batchtools_0.10.0-9000 future_1.22.1-9000           

loaded via a namespace (and not attached):
 [1] parallelly_1.28.1 magrittr_2.0.1    hms_1.1.0         progress_1.2.2   
 [5] rappdirs_0.3.3    debugme_1.1.0     R6_2.5.1          brew_1.0-6       
 [9] rlang_0.4.11      fansi_0.5.0       globals_0.14.0    tools_4.0.3      
[13] parallel_4.0.3    checkmate_2.0.0   data.table_1.14.0 utf8_1.2.2       
[17] withr_2.4.2       ellipsis_0.3.2    base64url_1.4     digest_0.6.27    
[21] tibble_3.1.4      lifecycle_1.0.0   crayon_1.4.1      fs_1.5.0         
[25] vctrs_0.3.8       batchtools_0.9.15 codetools_0.2-18  stringi_1.7.4    
[29] pillar_1.6.2      compiler_4.0.3    backports_1.2.1   prettyunits_1.1.1
[33] listenv_0.8.0     pkgconfig_2.0.3  

Downgrading to future version 1.21.0 appears to have fixed the issue on my end. So could this have to do with the recent release of version 1.22.1 on August 25? Apologies for posting the issue in the wrong place.

Thanks @wlandau - I can confirm the behavior and that downgrading to future 1.21.0 fixes this in our environment.

Hi, this actually never meant to work, because resources is not an argument that is part of the Future API, and more specifically future(). It was just a side effect of ... arguments being passed down to the underlying future backend constructor (here batchtools_sge). The reason it is not meant to work is that it breaks the core philosophy that future code should not make assumptions about what backend is being used.

You can argue that being able to specify "resources" should be part of the Future API. There is indeed a plan to work on this, cf. https://www.futureverse.org/roadmap.html. With such an API, it would make sense for a developer to specify certain "resources" that is requires in order to resolve a future.

A somewhat hacky workaround would be to do something like the following in targets;

the_plan <- plan("list")
strategy <- the_plan[[1]]
if (inherits(strategy, "batchtools_template")) {
  strategy <- tweak(strategy, resources = resources)
  the_plan[[1]] <- strategy
  old_plan <- plan(the_plan)
  on.exit(plan(old_plan), add = TRUE)
}

This also highlight my point above. Imagine there's another HPC backend that names the corresponding argument, say, needs. Then you would have to add conditional code for that as well. More importantly, if the developer would use future(..., resources = ...) their code would not run as expected with this new backend. So, resources is not part of the Future API (yet).

PS. We have the related problem that batchtools don't have a strict definition of what resources should hold. So, it is really up to the person writing the template file to define what it should contain. It could be different between different HPC templates, which would again cause the code to be hard-coded for a specific backend. The goal to add support for "resource specifications" to the Future API is to at least identify a core set of well-defined specs (and syntax for it).

Thanks for explaining, Henrik. targets is already capable of temporary plans, and I have updated the docs to encourage that kind of usage.

From my perspective as a user, it seemed safer to supply resources to the future instead of the plan because temporary plans seem difficult in the general case: for example, if temporarily switching to a different plan automatically shuts down the PSOCK cluster in the original plan. targets currently relies on future::plan(.cleanup = FALSE), which as you explained earlier, is not really officially supported either.

But I do understand the need for a consistent interface with appropriate semantic guardrails.