jakob-r/mlrHyperopt

Can't reproduce the results

Closed this issue · 10 comments

I use mlrHyperopt and set a seed for random number generator. Unfortunately, I can't reproduce the results.

The code:

library(mlr)
library(mlrHyperopt)

optimize <- function() {hyperopt(sonar.task,
                                 learner = "classif.rpart",
                                 show.info = FALSE)
}

set.seed(123, "L'Ecuyer")
rez_1 <- optimize()

set.seed(123, "L'Ecuyer")
rez_2 <- optimize()

rez_1
rez_2
identical(rez_1, rez_2)

Results in:

# > rez_1
# Tune result:
#     Op. pars: cp=0.000984; maxdepth=30; minbucket=5; minsplit=5
# mmce.test.mean=0.274

# > rez_2
# Tune result:
#     Op. pars: cp=0.207; maxdepth=11; minbucket=33; minsplit=34
# mmce.test.mean=0.284

# > identical(rez_1, rez_2)
# [1] FALSE

Where is the mistake?

There is no mistake on your side. When you look deeper into the result

as.data.frame(rez_1$opt.path)
as.data.frame(rez_2$opt.path)

or in this case also the following

as.data.frame(rez_1$mbo.result$opt.path)
as.data.frame(rez_2$mbo.result$opt.path)

you will see that the optimization started with the same initial design (dob = 0 for the mbo.result) - that wouldn't happen with a different seed.
The problem here is that for this scenario we use mlrMBO for the optimization which relies on DiceKriging for the surrogate. It seems that for specific settings this does not work reproducible with the same seed. I have yet to find out which settings trigger this behavior.

However let me note that mlrHyperopt is not yet intended for a reproducible research workflow.

ck37 commented

That sounds fair yet it would be important to note the lack of replicability in the readme.md etc. imo. People need to be given proactive information so that they don't (yet) use it in papers they want to publish.

It seems to be related to rpart. Running this works as expected and also relies on mlrMBO.

optimize <- function() {
  hyperopt(sonar.task,
           learner = "classif.randomForest",
           show.info = FALSE)
}

set.seed(123, "L'Ecuyer")
rez_1 <- optimize()

set.seed(123, "L'Ecuyer")
rez_2 <- optimize()

rez_1
rez_2
identical(rez_1$opt.path$env$path$mmce.test.mean, rez_2$opt.path$env$path$mmce.test.mean)

Please note that you can not compare the result objects as they also include timings of how long some evaluations took and they naturally vary.

Found the guilty one. It's rgenoud which is used by DiceKriging which is used by mlrMBO for purely numeric optimization problems which is the case for optimizing the parameters for rpart. For optimizing the parameters of randomForest another method is used which does not cause any trouble.

I noticed, that irreproducible results occur with learner = "classif.rpart" too.

Will it take long to fix the bug in rgenoud?
Should a new issue by raised in mlrMBO GitHub repository?

I tried to run the code with learner = "classif.randomForest" for several times. And the result is not always the same:

library(mlr)
library(mlrHyperopt)

optimize <- function() {
    hyperopt(sonar.task,
             learner = "classif.randomForest",
             show.info = FALSE)
}

set.seed(123, "L'Ecuyer")
rez_1 <- optimize()

set.seed(123, "L'Ecuyer")
rez_2 <- optimize()

set.seed(123, "L'Ecuyer")
rez_3 <- optimize()

set.seed(123, "L'Ecuyer")
rez_4 <- optimize()

Then I got many warnings like this:

Warning in generateDesign(control$infill.opt.focussearch.points, ps.local,  :
  generateDesign could only produce 600 points instead of 1000!

An one final warning:

Warning message:
In (function (fn, nvars, max = FALSE, pop.size = 1000, max.generations = 100,  :
  Stopped because hard maximum generation limit was hit.

(Are these warnings expected?)

But the most important thing is that the results of optimization for randomForest are not always the same:

rez_1
## Tune result:
## Op. pars: nodesize=1; mtry=12
## mmce.test.mean=0.139

rez_2
## Tune result:
## Op. pars: nodesize=1; mtry=9
## mmce.test.mean=0.144

rez_3
## Tune result:
## Op. pars: nodesize=1; mtry=12
## mmce.test.mean=0.139

rez_4
## Tune result:
## Op. pars: nodesize=1; mtry=12
## mmce.test.mean=0.139

identical(rez_1$opt.path$env$path$mmce.test.mean,
          rez_2$opt.path$env$path$mmce.test.mean) &
identical(rez_3$opt.path$env$path$mmce.test.mean,
          rez_4$opt.path$env$path$mmce.test.mean) 
## [1] FALSE

Is it stil the fault of rgenoud package?

  1. Both warnings are to be expected and caused by other packages

  2. It's always a hassle to figure out where the reproducibility fails. Probably most packages are just tested against the default Mersenne-Twister. Using it gives the expected same results. I don't know about the special characteristics of L'Ecuyer but reading the help of set.seed it mentions that it has to be initialized with an integer of at least 6 digits with neither the last nor the first 3 being zero. So when I use set.seed(123456, "L'Ecuyer") it gives me reproducible results as expected.

I just wanted to mention that the problem in the package rgenoud is fixed now and you should be able to get reproducible results now.

Good news. Thank you for the message.

I checked. Now the results are reproducible.

Thanks once more.