mlr-org/parallelMap

could not find function "dir.exists"

rubenohayon opened this issue · 13 comments

Hi,

I used parallelMap for Azure Machine Learning, but when I use I get this error on this function :
parallelStartSocket

Thanks

thats from base R.

Can you please post sessionInfo() ?

[ModuleOutput] [1] '2.7'
[ModuleOutput]
[ModuleOutput] R version 3.1.0 (2014-04-10)
[ModuleOutput]
[ModuleOutput] Platform: x86_64-w64-mingw32/x64 (64-bit)
[ModuleOutput]
[ModuleOutput]
[ModuleOutput]
[ModuleOutput] locale:
[ModuleOutput]
[ModuleOutput] [1] LC_COLLATE=English_United States.1252
[ModuleOutput]
[ModuleOutput] [2] LC_CTYPE=English_United States.1252
[ModuleOutput]
[ModuleOutput] [3] LC_MONETARY=English_United States.1252
[ModuleOutput]
[ModuleOutput] [4] LC_NUMERIC=C
[ModuleOutput]
[ModuleOutput] [5] LC_TIME=English_United States.1252
[ModuleOutput]
[ModuleOutput]
[ModuleOutput]
[ModuleOutput] attached base packages:
[ModuleOutput]
[ModuleOutput] [1] splines grid parallel stats graphics grDevices utils
[ModuleOutput]
[ModuleOutput] [8] datasets methods base
[ModuleOutput]
[ModuleOutput]
[ModuleOutput]
[ModuleOutput] other attached packages:
[ModuleOutput]
[ModuleOutput] [1] Hmisc_3.14-4 Formula_1.1-1 survival_2.37-7 lattice_0.20-29
[ModuleOutput]
[ModuleOutput] [5] mlr_2.7 ggplot2_1.0.0 parallelMap_1.3 ParamHelpers_1.6
[ModuleOutput]
[ModuleOutput] [9] BBmisc_1.9 checkmate_1.7.0 Metrics_0.1.1 xgboost_0.4-2
[ModuleOutput]
[ModuleOutput] [13] doParallel_1.0.10 iterators_1.0.7 foreach_1.4.2 magrittr_1.5
[ModuleOutput]
[ModuleOutput]
[ModuleOutput]
[ModuleOutput] loaded via a namespace (and not attached):
[ModuleOutput]
[ModuleOutput] [1] chron_2.3-45 cluster_1.15.2 codetools_0.2-8
[ModuleOutput]
[ModuleOutput] [4] colorspace_1.2-4 data.table_1.9.4 digest_0.6.4
[ModuleOutput]
[ModuleOutput] [7] gtable_0.1.2 latticeExtra_0.6-26 MASS_7.3-33
[ModuleOutput]
[ModuleOutput] [10] Matrix_1.1-4 munsell_0.4.2 plyr_1.8.1
[ModuleOutput]
[ModuleOutput] [13] proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.11.2
[ModuleOutput]
[ModuleOutput] [16] reshape2_1.4 scales_0.2.4 stringr_0.6.2
[ModuleOutput]
[ModuleOutput] [19] tools_3.1.0

Please also post some reproducing code + traceback()

yes no problem

I used with a dataset sot I'm going to post a simple code

#This is how install packages in Azure ML
install.packages("src/Metrics_0.1.1.zip", lib = ".", repos = NULL, verbose = TRUE)
install.packages("src/checkmate_1.7.0.zip", lib = ".", repos = NULL, verbose = TRUE)
install.packages("src/mlr_2.7.zip", lib = ".", repos = NULL, verbose = TRUE)
install.packages("src/xgboost_0.4-2.zip", lib = ".", repos = NULL, verbose = TRUE)
install.packages("src/BBmisc_1.9.zip", lib = ".", repos = NULL, verbose = TRUE)
install.packages("src/ParamHelpers_1.6.zip", lib = ".", repos = NULL, verbose = TRUE)
install.packages("src/parallelMap_1.3.zip", lib = ".", repos = NULL, verbose = TRUE)
install.packages("src/magrittr_1.5.zip", lib = ".", repos = NULL, verbose = TRUE)
install.packages("src/BatchJobs_1.6.zip", lib = ".", repos = NULL, verbose = TRUE)
install.packages("src/doParallel_1.0.10.zip", lib = ".", repos = NULL, verbose = TRUE)

library(magrittr, lib.loc=".", verbose=TRUE)
library(doParallel, lib.loc=".", verbose=TRUE)
library(xgboost, lib.loc=".", verbose=TRUE)
library(Metrics, lib.loc=".", verbose=TRUE)
library(checkmate, lib.loc=".", verbose=TRUE)
library(BBmisc, lib.loc=".", verbose=TRUE)
library(ParamHelpers, lib.loc=".", verbose=TRUE)
library(parallelMap, lib.loc=".", verbose=TRUE)
library(mlr, lib.loc=".", verbose=TRUE)

library(doParallel, lib.loc=".", verbose=TRUE)
library(Hmisc)

packageVersion("mlr")

sessionInfo()

Then I tried this simple code :

library(parallelMap)
parallelStartSocket(2) # start in socket mode and create 2 processes on localhost
f = function(i) i + 5 # define our job
y = parallelMap(f, 1:2) # like R's Map but in parallel
parallelStop() # turn parallelization off again

and i get the error on parallelStartSocket

I really need the output of traceback()

dir.exists is not even used in parallelMap, I just scanned the code

This issue here is basically the same thing
r-lib/pkgdown#33

So, dir.exists was added in 3.2.0, but you have 3.1.0

but like I said: in parallelMap I dont use this. so I need to see from the traceback where this is called so I can help

mllg commented

Could be linked to checkmate::assertDirectory which calls dir.exists. But there is a backport in checkmate for this (https://github.com/mllg/checkmate/blob/master/R/backports.r).

This typically happens if you have multiple R versions installed on the system. E.g., if you install checkmate with R-3.2.1 but then load the package with R-3.1.0, the backported function will not exist because it was not defined during compile time.

mllg commented

And after a closer look, I assume you are using windows binaries. The stuff with multiple R versions is exactly what is going wrong on your setup. You install the windows binaries which where build with R-3.2.0 or higher and thus do not include the backport.

You should either upgrade R, install from CRAN or try a source installation.

@mllg
Thx

To the point:
I completely agree on what "goes wrong here". And that one should and cannot do that.
Well, on a cluster, your first and last suggestion are often not possible.
(just saying, as I often hate such answers from the R mailing lists :) )

But I dont get why a proper install from CRAN is not done.

mllg commented

Lets just hope that @rubenohayon is not running a cluster with a windows OS. 🙏

Well read the OP

I used parallelMap for Azure Machine Learning, b

I guess he is at least somewhere in the cloud

Hey,

Thanks for all your answer, honestly it's really fast in the cloud so don't really need to parallelize but thank for your help 👍