Object oriented code?
Opened this issue · 9 comments
Base R methods such as aggregate, split, mean, [
have corresponding functions distribagg, distribsplit, distribmeans, distribgetrows
in partools.
It might be nice to have a distributed object, like a distributed data frame, and write the partools versions as methods.
Thoughts?
Benefits
- Initially more familiar to users, which will help with adoption
- Well defined examples in base R to follow make it natural to extend this approach to other methods
- May simplify some partools code and make it more flexible if we can pass arguments directly through to methods on worker nodes using mechanism such as
...
Costs
- Likely a few weeks of initial coding
- Might end up creating and maintaining two versions of code which do the same thing
I've been thinking about this as I read Chambers' Extending R book (very good by the way).
I am just raising the issue to discuss and see if this is a possibility, or something that was considered and rejected.
For example, compare the arguments of aggregate to distribagg().
Could you explain further? Here's the relevant signatures:
## S3 method for class 'data.frame'
aggregate(x, by, FUN, ..., simplify = TRUE, drop = TRUE)
## S3 method for class 'formula'
aggregate(formula, data, FUN, ...,
subset, na.action = na.omit)
## Partools:
distribagg(cls,ynames,xnames,dataname,FUN,FUNdim=1,FUN1=FUN)
It would be useful if you were to add some distrib*() function yourself, to see the problems.
I just had a go at writing a function to add a new column to a dataframe which is already distributed, say az
. The call was looking something like this:
distribsetcol(cls, dataname = az, colname = LETTER, FUN = function(x) toupper(x$letter))
where FUN
is a function expecting the chunk of az
in that node. Then I realized that it's cleaner and easier to instead use the stuff already in parallel:
clusterEvalQ(cls, {
az$LETTER <- toupper(az$letter)
NULL # Necessary to avoid returning LETTER
})
Maybe I'll add something like this into the docs.
Example is in the vignette now.