mlr-org/mlrMBO

Abort R session with mlrMBO through a hundred sequence of 'mbo' function calls

ricky151192 opened this issue · 8 comments

I recently updated mlrMBO to version 1.1.3 on R 3.5.1 with Windows 10.
Currently, I am trying to execute an experiment that requires hundred of experiments using mlrMBO library.
However, I figured out that a problem from mlrMBO seems to abort of R session without any warnings or error notification (both using: Rgui.exe, Rscript.exe, and RStudio)
To be clear on the issue, the code below calls several hundred times the “mbo” function in a sequential way and not in parallel.
Also, I kept under control the memory RAM usage from the R process during the execution but is not reach a critical amount of RAM used.
Here below, I reported the simple example code that stops the R session each time I execute the code.
The code below is composed of two scripts: “mainDeterministicmlrMBO.R” and “trialDeterministicmlrMBO.R”
-------- mainDeterministicmlrMBO.R -------

library(mlrMBO)

obj.fun = makeSingleObjectiveFunction(
  name = "SineMixture",
  fn = function(x) sin(x[1])*cos(x[2])/2 + 0.04 * sum(x^2),
  par.set = makeNumericParamSet(id = "x", len = 2, lower = -5, upper = 5)
)

if(!dir.exists(file.path("./", "trial/"))){dir.create("./trial")}
beta <- 0.5

for(kernel in  c("gauss","matern3_2","matern5_2","exp","powexp")){
  for(seed in 1:10){
    source("trialDeterministicmlrMBO.R")
    print(paste0("seed", seed))
  }
  print(paste0("kernel", kernel))
}

-------- trialDeterministicmlrMBO.R ---------

set.seed(seed)
for (iter in 1:10){
  
  ctrl = makeMBOControl(propose.points = 1L)

  ctrl = setMBOControlTermination(ctrl, iters = 1)

  design = generateDesign(n = 3L, par.set = getParamSet(obj.fun),
                       fun = lhs::randomLHS)

  ctrl_LCB = setMBOControlInfill(ctrl, crit = makeMBOInfillCritCB(cb.lambda = beta), opt = "focussearch", opt.focussearch.maxit=NULL, opt.focussearch.points = 2000L)

  ctrl_EI = setMBOControlInfill(ctrl, crit = makeMBOInfillCritAEI(), opt = "focussearch", opt.focussearch.maxit=NULL, opt.focussearch.points = 2000L)

  ctrl_PoI = setMBOControlInfill(ctrl, crit = makeMBOInfillCritEQI(), opt = "focussearch", opt.focussearch.maxit=NULL, opt.focussearch.points = 2000L)
  
  lrn = makeMBOLearner(ctrl, obj.fun, optim.method = "BFGS",config = list(show.learner.output = FALSE))
  lrn$par.vals$covtype=kernel
  
  design_LCB <- design_EI <- design_PoI <- design
  
  for( k in 4:15 ) {
   
    run <- mbo(obj.fun, design=design_LCB, learner = lrn, control = ctrl_LCB, 
               show.info = getOption("mlrMBO.show.info", F), more.args = list() )
    next_LCB <- getOptPathX(run$opt.path)[k,] 
    
    run <- mbo(obj.fun, design=design_EI, learner = lrn, control = ctrl_EI,
               show.info = getOption("mlrMBO.show.info", F), more.args = list() )
    next_EI <- getOptPathX(run$opt.path)[k,]
    
    run <- mbo(obj.fun, design=design_PoI,learner = lrn, control = ctrl_PoI, 
               show.info = getOption("mlrMBO.show.info", F), more.args = list() )
    next_PoI <- getOptPathX(run$opt.path)[k,]
    
    design_LCB <- rbind(design_LCB, next_LCB)
    design_EI <- rbind(design_EI, next_EI)
    design_PoI <- rbind(design_PoI, next_PoI)
   
  }
    save(design_LCB, design_EI, design_PoI, 
             file=paste0("./trial/Subject_",iter,"_Seed_",seed,"_",kernel,".RData" ))
    print(paste0("Subject=",iter))
}

Thank you in advance.

Thanks for your report. Can you provide a minimal example that allows to reproduce the issue please?

I already did that with the issue above with two scripts " mainDeterministicmlrMBO.R" and " trialDeterministicmlrMBO.R ".

It took about 20 minutes of running your code for me to trigger the bug, somewhere in the 4 nested loops, doing three different calls to mbo and various bookkeeping things. Digging through all of this makes debugging much harder than necessary; please provide a simple example that immediately shows the bug.

mb706 commented

I'm on it. The problem seems to be some C++-code in the lhs package which I assume causes some memory corruption that does not immediately trigger a crash. I assume that is why ricky151192's code example is so long.

mb706 commented

The bug was this one: bertcarnell/lhs#21

My PR for the lhs package should fix this. @ricky151192 you can install it using

remotes::install_github("mb706/lhs")

and see if that works.

mb706 commented

P.S. @ricky151192 thanks for providing the example. We knew for a while that there is a segfault hidden somewhere in the mlrMBO execution path but it was triggered so infrequently that we were not able to find it until now.

I am glad to have been helpful! Apparently, with this version of LHS library, I was able to run without errors/crashes the previous example code.
Thank you!

mb706 commented

That solves the issue as far as mlrMBO is concerned.