stekhoven/missForest

Error message: Error in sample.int(length(x), size, replace, prob) : invalid first argument

Closed this issue · 15 comments

Hi
What am I doing wrong here?

x2.imp <- missForest(x2)

x2 is a data frame, all numerical values, with some missing values. I get this error:

Error in sample.int(length(x), size, replace, prob) :
invalid first argument

but when I run with the iris dataset as in the tutorial, I get no errors. Would appreciate it if you could clarify the issue for me. cheers

@Mosen111 Why did you close this issue?

I guessed there was no one to answer, so I closed it.

@Mosen111 : Did you find a solution to this problem? f not, It should be reopened.

Hi, I reopened it, but there is no one to answer here.

A data.frame of numbers with some missing values should run without error. For example the following code works for me:

v <- rnorm(20)
v[sample(1:20, size=5)] <- NA
x2 <- as.data.frame(matrix(v, nrow=5))
missForest(x2)

Try debug(missForest) and investigate at what point the code fails, then compare it to what happens when using the iris data.

thanks but too technical for me, I have already found alternatives that work smoothly. cheers

I had a similar issue before, was caused by a sloppy type check in missForest but should be resolved in the development version, should probably go to CRAN though

same problem...

Dear Mosen111, what kind of alternative did you use?
I also have the same problem, and have no idea what to do!

In my case, I was computing the difference in days between to date variables, e.g.,
variable = d1 - d2. The result, althought looked integer, was class "difftime". Using
variable = as.numeric(variable, units="days")

made missForest happy. I hope this helps.

Hi, your input must be a matrix. The following should work.
x2.imp <- missForest(as.matrix(x2))

I had the same problem, turned out missForest did not like my datetime class for my date variables. I computed them into numeric (which should be easier for any analysis done on them anyway) and that made the error go away. Hope this helps!

Having just run into the same issue, here's the not-so-obvious but simple cause / solution:

I happen to load my data via read_csv to take advantage of the tibble type. This is the actual culprit. Applying a as.data.frame() ahead of the missForest() call solved my problem. Alternatively, you could simply fallback to the old way of loading data via read.csv().

Applying as.data.frame() prior to running missForest solved my problem as well.

We intend to move to ranger soon, this will make missForest able to handle tibbles.