edwindj/ffbase

error in unique

Opened this issue · 5 comments

Setup (Windows 10):

  • platform x86_64-w64-mingw32
  • arch x86_64
  • os mingw32
  • system x86_64, mingw32
  • status
  • major 3
  • minor 6.1
  • year 2019
  • month 07
  • day 05
  • svn rev 76782
  • language R
  • version.string R version 3.6.1 (2019-07-05)
  • nickname Action of the Toes

When running the unique sample from CRAN I get:

unique.ff

unique(ffiris$Sepal.Length)
Error in if (by < 1) stop("'by' must be > 0") :
missing value where TRUE/FALSE needed
In addition: Warning message:
In chunk.default(from = 1L, to = 300L, by = c(double = 23058430092136940), :
NAs introduced by coercion to integer range

ffbase version : 0.12.7
ff version : 2.2-14
bit version : 1.1-14
fastmatch version : 1.1-0

Above is working fine on the following:

Windows(7):

  • platform x86_64-w64-mingw32
  • arch x86_64
  • os mingw32
  • system x86_64, mingw32
  • status
  • major 3
  • minor 5.1
  • year 2018
  • month 07
  • day 02
  • svn rev 74947
  • language R
  • version.string R version 3.5.1 (2018-07-02)
  • nickname Feather Spray

Thanks for reporting!
Seems related to #56 . Will dig into it later this week.

I cannot reproduce the bug on Rhub (which runs on Windows 2008 SP2), but don't despair...

Technically it is in realm of ff (and not ffbase), but I do have a hunch what the problem might be, using the error message and glaring the ff code (which is not mine).

ff uses chunking to process large vectors and data.frames. The size of a chunk is determined by the option "ffbatchbytes". It seems that on your Windows 10 machine(s) the value for the option isn't set correctly. May be because you are using 32bits R (so one option is to switch to 64bits).

ff sets this value automatically when library(ff) is called (see following code)

copied from ff:::.onLoad()

   if (is.null(getOption("ffmaxbytes"))) {
        if (.Platform$OS.type == "windows") {
            if (getRversion() >= "2.6.0") 
                options(ffmaxbytes = 0.5 * memory.limit() * (1024^2))
            else options(ffmaxbytes = 0.5 * memory.limit())
        }
        else {
            options(ffmaxbytes = 0.5 * 1024^3)
        }
    }

I suggest you set the options(ffmaxbytes) manually and try to run the examples again.

# e.g. 500MB
options(ffmaxbytes =  500 * (1024^2))

Hi Edwin.

Thank you for the feedback. The solution is not working but we're a step closer.

This is the situation at the moment (all on Windows 10):

  • RStudio with R 3.5.1 - Working
  • RGui 3.6.1 - Working (as is and with suggested options)
  • RStudio with R 3.6.1 - Not working (with and without suggested options)

I'm selection the 64 bit version of R in Rstudio.

Regards.

Haven't got windows 10 machine myself but the problem clearly comes from ff::chunk, namely from ff::chunk.ff_vector which is defined as follows

The relevant part is this: b <- BATCHBYTES%/%RECORDBYTES. This calculation apparently on your machine gives 23058430092136940 for reasons beyond my understanding (given that you report it works on Rgui but not on RStudio).

You could probably get around on this by changing option ffbatchbytes to something like this options(ffbatchbytes = 84882227) - which is the number I have on my oldskool windows 7

function (x, RECORDBYTES = .rambytes[vmode(x)], BATCHBYTES = getOption("ffbatchbytes"), 
    ...) 
{
    n <- length(x)
    if (n) {
        l <- list(...)
        if (is.null(l$from)) 
            l$from <- 1L
        if (is.null(l$to)) 
            l$to <- n
        if (is.null(l$by) && is.null(l$len)) {
            b <- BATCHBYTES%/%RECORDBYTES
            if (b == 0L) {
                b <- 1L
                warning("single record does not fit into BATCHBYTES")
            }
            l$by <- b
        }
        l$maxindex <- n
        ret <- do.call("chunk.default", l)
    }
    else {
        ret <- list()
    }
    ret
}