anniejw6/metrumrg

raised.keyed is slow, and misses some important assumptions.

Closed this issue · 1 comments

raised.keyed probably should not bother to search columns in y that are already 
defined in x, nor should it search columns in partial results that were 
discovered and merged in prior partial results.  Consider the following 
alternative.

raised.keyed <- function (x, y) {
    key <- key(y)
    message("serial left join of ", nrow(x), " rows and ", nrow(y), " rows on ", paste(key, collapse = ", "))
    known <- names(x)[!names(x) %in% key]
    series <- lapply(seq_along(key), function(n) key[seq_len(n)])
    for(i in seq_along(series)){
      key <- series[[i]]
      # This is by definition a fishing expedition, so don't look in y cols already defined in x
      y <- y[,setdiff(names(y),known),drop=FALSE]
      z <- static(y,on=key)
      new <- setdiff(names(z),key)
      known <- union(known,new)
      try(x <- stableMerge(x, z), silent = TRUE)
    }
    x
}

Original issue reported on code.google.com by bergs...@gmail.com on 11 Jan 2013 at 7:34

@5.29

Original comment by bergs...@gmail.com on 15 Jan 2013 at 4:25

  • Changed state: Fixed