matloff/partools

Object oriented code?

Opened this issue · 9 comments

Base R methods such as aggregate, split, mean, [ have corresponding functions distribagg, distribsplit, distribmeans, distribgetrows in partools.

It might be nice to have a distributed object, like a distributed data frame, and write the partools versions as methods.

Thoughts?

Benefits

  • Initially more familiar to users, which will help with adoption
  • Well defined examples in base R to follow make it natural to extend this approach to other methods
  • May simplify some partools code and make it more flexible if we can pass arguments directly through to methods on worker nodes using mechanism such as ...

Costs

  • Likely a few weeks of initial coding
  • Might end up creating and maintaining two versions of code which do the same thing

I've been thinking about this as I read Chambers' Extending R book (very good by the way).

I am just raising the issue to discuss and see if this is a possibility, or something that was considered and rejected.

For example, compare the arguments of aggregate to distribagg().

Could you explain further? Here's the relevant signatures:

## S3 method for class 'data.frame'
aggregate(x, by, FUN, ..., simplify = TRUE, drop = TRUE)

## S3 method for class 'formula'
aggregate(formula, data, FUN, ...,
           subset, na.action = na.omit)

## Partools:
distribagg(cls,ynames,xnames,dataname,FUN,FUNdim=1,FUN1=FUN)

It would be useful if you were to add some distrib*() function yourself, to see the problems.

I just had a go at writing a function to add a new column to a dataframe which is already distributed, say az. The call was looking something like this:

distribsetcol(cls, dataname = az, colname = LETTER, FUN = function(x) toupper(x$letter))

where FUN is a function expecting the chunk of az in that node. Then I realized that it's cleaner and easier to instead use the stuff already in parallel:

    clusterEvalQ(cls, {
        az$LETTER <- toupper(az$letter)
        NULL  # Necessary to avoid returning LETTER
    })

Maybe I'll add something like this into the docs.

Example is in the vignette now.