More helper functions

Question

More helper functions

Opened this issue 9 years ago · 3 comments

I think the code in partools could be made substantially more elegant with a few helper functions:

clusterRequireNamespace <- function(cl, pkg) {
  clusterCall(cl, requireNamespace, package = pkg, quietly = TRUE)
}
clusterRequirePackage <- function(cl, pkg) {
  clusterCall(cl, library, package = pkg, character.only = TRUE)
}
clusterEval < function(cl, expr) {
  clusterCall(cl, eval, expr, env = .GlobalEnv)
}

clusterAssign <- function(cl, name, value) {
  clusterCall(cl, assign, x = name, value = value)
}

asCall <- function(x) {
  if (inherits(x, "formula")) {
    stopifnot(length(x) == 2)
    x[[2]]
  } else if (is.atomic(x) || is.name(x) || is.call(x)) {
    x
  } else {
    stop("Unknown input")
  }
}

clusterEvalI <- function(cl, expr, ...) {
  args <- lapply(list(...), asCall)
  interpolated <- methods::substituteDirect(asCall(expr), args)

  clusterEval(cl, interpolated)
}

This allows (e.g.) fileread to be rewritten as:

fileread <- function (cls, fname, dname, ndigs, header = FALSE, sep = " ", 
                      usefread = FALSE) {
  if (usefread) {
    clusterRequireNamespace(cls, "data.table")
    clusterEvalI(cls, ~ myfread <- data.table::fread))
  } else {
    clusterEvalI(cls, ~ myfread <- read.table))
  }

  clusterEvalI(cl,
    ~ `_mychunk` <- filechunkname(fname, ndigs), 
    fname = fname, ndig
    s = ndgs
  )

  clusterEvalI(cls,
    ~ dname <- myfread(`_mychunk`, header = header, sep = sep),
    dname = as.name(dname),
    header = header,
    sep = sep
  )
}

which I think is rather more clear since it focusses on the computation, rather than the details of the call manipulation (and working with the AST is obviously more elegant than working with strings)

(Also note the use of _mychunk as a variable name - using a non syntactic variable should decrease the chances of accidentally clobbering an object that the user created)

I haven't tested this code yet, but I'm happy to do so if you're interested in the idea.

Answer 1 · 2015-08-16T16:39:16.000Z

Thanks, I'll look into this. Actually, there are lots of ways in which the code could be made more elegant. :-)

Answer 2 · 2017-05-17T00:37:14.000Z

Making changes now to make these sorts of functions more clear. For example:

Before:

filesave <- function(cls,dname,newbasename,ndigs,sep) {

# what will the new file be called at each node?
tmp <- paste('"',newbasename,'",',ndigs,sep='')
cmd <- paste('myfilename <- filechunkname(',tmp,')',sep='')
clusterExport(cls,"cmd",envir=environment())
clusterEvalQ(cls,eval(parse(text=cmd)))
# start building the write.table() call
tmp <- paste(dname,'myfilename',sep=',')
# what will the column names be for the new files?
clusterEvalQ(cls,eval(parse(text=cmd)))
cncmd <- paste('colnames(',dname,')',sep='')
clusterExport(cls,"cncmd",envir=environment())
clusterEvalQ(cls,cnames <- eval(parse(text=cncmd)))[[1]]
# now finish pasting the write.table() command, and run it
writecmd <- paste('write.table(',tmp,
   ',row.names=FALSE,col.names=cnames,sep="',sep,'")',sep='')
clusterExport(cls,"writecmd",envir=environment())
clusterEvalQ(cls,eval(parse(text=writecmd)))

After:

filesave <- function(cls,dname,newbasename,ndigs,sep, ...) {
    write_one_chunk <- function(dname, newbasename, ndigs, sep, ...)
    {   
        write.table(get(dname)
                    , file = filechunkname(newbasename, ndigs)
                    , sep = sep 
                    , ...)
    }   

    clusterCall(cls, write_one_chunk, dname, newbasename, ndigs, sep, ...)
}

This was motivated by needing to pass more arguments into the underlying functions like write.table(). If there are no objections I'd like to continue refactoring the code in this way.

Answer 3 · 2017-05-17T02:15:24.000Z

That's fine. Looking forward to hearing how well it all works. Norm

…

On Tue, May 16, 2017 at 05:37:14PM -0700, Clark Fitzgerald wrote: Making changes now to make these sorts of functions more clear. For example: Before: ```R filesave <- function(cls,dname,newbasename,ndigs,sep) { # what will the new file be called at each node? tmp <- paste('"',newbasename,'",',ndigs,sep='') cmd <- paste('myfilename <- filechunkname(',tmp,')',sep='') clusterExport(cls,"cmd",envir=environment()) clusterEvalQ(cls,eval(parse(text=cmd))) # start building the write.table() call tmp <- paste(dname,'myfilename',sep=',') # what will the column names be for the new files? clusterEvalQ(cls,eval(parse(text=cmd))) cncmd <- paste('colnames(',dname,')',sep='') clusterExport(cls,"cncmd",envir=environment()) clusterEvalQ(cls,cnames <- eval(parse(text=cncmd)))[[1]] # now finish pasting the write.table() command, and run it writecmd <- paste('write.table(',tmp, ',row.names=FALSE,col.names=cnames,sep="',sep,'")',sep='') clusterExport(cls,"writecmd",envir=environment()) clusterEvalQ(cls,eval(parse(text=writecmd))) ``` After: ```R filesave <- function(cls,dname,newbasename,ndigs,sep, ...) { write_one_chunk <- function(dname, newbasename, ndigs, sep, ...) { write.table(get(dname) , file = filechunkname(newbasename, ndigs) , sep = sep , ...) } clusterCall(cls, write_one_chunk, dname, newbasename, ndigs, sep, ...) } ``` This was motivated by needing to pass more arguments into the underlying functions like `write.table()`. If there are no objections I'd like to continue refactoring the code in this way. -- You are receiving this because you commented. Reply to this email directly or view it on GitHub: #2 (comment)