bwlewis/doRedis

doRedis/rredis breaking on large data sets

Closed this issue · 6 comments

When trying to pass in a run with foreach via doRedis on larger sets I am encountering this error:

Error in readBin(con, raw(), 1000000L) :
 negative length vectors are not allowed

This error is reproducible and is tied directly to the size of the data frame being passed. It also seems to have more trouble with factor columns being passed as I can pass in larger data frames (in both dimension and memory) that have only int or num columns. The top of my stacktrace is:

"readBin(con, raw(), 1000000L)"

[[2]]
[1] ".burn(\"Empty\")"

[[3]]
[1] "doTryCatch(return(expr), name, parentenv, handler)"

[[4]]
[1] "tryCatchOne(expr, names, parentenv, handlers[[1L]])"

[[5]]
[1] "tryCatchList(expr, classes, parentenv, handlers)"

[[6]]
 [1] "tryCatch({"                                                                                           
 [2] "    con <- .redis()"                                                                                  
 [3] "    l <- readLines(con = con, n = 1)"                                                                 
 [4] "    if (length(l) == 0) "                                                                             
 [5] "        .burn(\"Empty\")"                                                                             
 [6] "    tryCatch(env$count <- max(env$count - 1, 0), error = function(e) assign(\"count\", "              
 [7] "        0, envir = env))"                                                                             
 [8] "    s <- substr(l, 1, 1)"                                                                             
 [9] "    if (nchar(l) < 2) {"                                                                              
[10] "        if (s == \"+\") {"                                                                            
[11] "            return(\"\")"                                                                             
[12] "        }"                                                                                            
[13] "        .burn(\"Invalid\")"                                                                           
[14] "    }"                                                                                                
[15] "    switch(s, `-` = stop(substr(l, 2, nchar(l))), `+` = substr(l, "                                   
[16] "        2, nchar(l)), `:` = as.numeric(substr(l, 2, nchar(l))), "                                     
[17] "        `$` = {"                                                                                      
[18] "            n <- as.numeric(substr(l, 2, nchar(l)))"                                                  
[19] "            if (n < 0) {"                                                                             
[20] "                return(NULL)"                                                                         
[21] "            }"                                                                                        
[22] "            dat <- tryCatch(readBin(con, \"raw\", n = n), error = function(e) .redisError(e$message))"
[23] "            m <- length(dat)"                                                                         
[24] "            if (m == n) {"                                                                            
[25] "                l <- readLines(con, n = 1)"                                                           
[26] "                if (raw) return(dat) else return(tryCatch(unserialize(dat), "                         
[27] "                  error = function(e) rawToChar(dat)))"                                               
[28] "            }"                                                                                        
[29] "            rlen <- 50"                                                                               
[30] "            j <- 1"                                                                                   
[31] "            r <- vector(\"list\", rlen)"                                                              
[32] "            r[j] <- list(dat)"                                                                        
[33] "            while (m < n) {"                                                                          
[34] "                dat <- tryCatch(readBin(con, \"raw\", n = (n - "                                      
[35] "                  m)), error = function(e) .redisError(e$message))"                                   
[36] "                j <- j + 1"                                                                           
[37] "                if (j > rlen) {"                                                                      
[38] "                  rlen <- 2 * rlen"                                                                   
[39] "                  length(r) <- rlen"                                                                  
[40] "                }"                                                                                    
[41] "                r[j] <- list(dat)"                                                                    
[42] "                m <- m + length(dat)"                                                                 
[43] "            }"                                                                                        
[44] "            l <- readLines(con, n = 1)"                                                               
[45] "            length(r) <- j"                                                                           
[46] "            if (raw) do.call(c, r) else tryCatch(unserialize(do.call(c, "                             
[47] "                r)), error = function(e) rawToChar(do.call(c, "                                       
[48] "                r)))"                                                                                 
[49] "        }, `*` = {"                                                                                   
[50] "            numVars <- as.integer(substr(l, 2, nchar(l)))"                                            
[51] "            if (numVars > 0L) {"                                                                      
[52] "                replicate(numVars, .getResponse(raw = raw), simplify = FALSE)"                        
[53] "            } else NULL"                                                                              
[54] "        }, stop(\"Unknown message type\"))"                                                           
[55] "}, interrupt = function(e) .burn(e))"                                                                 

[[7]]
[1] ".getResponse()"

[[8]]
[1] ".redisCmd(.raw(cmd), .raw(key), value)"

[[9]]
[1] "redisSet(queueEnv, list(expr = expr, exportenv = exportenv, packages = obj$packages))"

[[10]]
[1] "e$fun(obj, substitute(ex), parent.frame(), e$data)"

This is possibly related to the redis maximum value size issue that others have reported. Values in Redis are limited to less than 512MB.

Yup, the interesting thing I think is that factors seem to hit that limit
much, much sooner, e.g. I took a set twice the size and made it all ints
and it worked fine. Then I made a few of the columns factors, halved the
set and it broke. I wonder if that can somehow be improved?

On Mon, Jun 17, 2013 at 1:42 PM, B. W. Lewis notifications@github.comwrote:

This is possibly related to the redis maximum value size issue that others
have reported. Values in Redis are limited to less than 512MB.


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-19573317
.

Can you send me self-contained repo code that generates dummy example data? I'd like to figure out what's going on...

I can't... my data is propriety, sorry. If you want, generate any large set
say (500K-1M x 40 or so) with a single classifier column as a factor. Make
the rest of the columns ints, it should go through. Then convert a few of
the data columns to factors (more than one) and it should break. Let me
know if that does it. If it doesn't, I can try to create actual dummy data
for you that breaks.

On Mon, Jun 17, 2013 at 2:39 PM, B. W. Lewis notifications@github.comwrote:

Can you send me self-contained repo code that generates dummy example
data? I'd like to figure out what's going on...


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-19576837
.

That sounds good, I'll try that and let you know what I find.

The new vignette has a section dedicated to this issue: Redis puts limits on value sizes. See the vignette for work-arounds.