mlr-org/mlrMBO

nested optimization for complex dependencies between params

vrodriguezf opened this issue · 7 comments

Hi,

I have a NumericVectorParam to optimize (init), whose length n needs to be optimized too. Currently, the range of possible values for n is low, so I am looping over all of them, merging the optimization paths afterwards.

n_range = 2:10
experiment.result = purrr::map(n_range, function (n) {
  tune.result = tuneParams(
    learner = "classif.fda.hmm",
    task = myTask,
    resampling = makeResampleDesc("Holdout"),
    measures = list(acc),
    par.set = makeParamSet(
      n = makeIntegerParam(id = "n", lower = "n", upper = "n"),
      init = makeNumericVectorParam(id = "init", len = n, lower = 0, upper = 1)
    ),
    control = makeTuneControlMBO()
  )
  tune.result
})

Is it possible to perform a nested MBO process, so that the outer call optimizes the value of n and the inner call optimizes the values of init?

Many thanks for your help!!

mb706 commented

Easiest solution I can think of is to write your own learner with all the arguments of your desired learner and an additional size argument that truncates the n parameter to the desired length. You'll have to read up on how to create custom learners for this.

A quick and dirty solution without a custom learner that I can think of would be to use ModelMultiplexer. I haven't tested this, but to give you an idea:

# create a few learners to put into ModelMultiplexer
# all learners must have different IDs.
# I'm creating three learners explicitly to make it clear what is
# happening. For a large range of n you would use some form
# of loop to do this
lrn2 <- makeLearner("classif.fda.hmm", id = "dim2")
lrn3 <- makeLearner("classif.fda.hmm", id = "dim3")
lrn4 <- makeLearner("classif.fda.hmm", id = "dim4")
mmlearner <- makeModelMultiplexer(list(lrn2, lrn3, lrn4))
# inspect mmlearner to see what is happening here:
# the 'selected.learner' selects which 
par.set <- makeParamSet(
  makeIntegerParam("selected.learner", lower = 2, upper = 4, # don't use a DiscreteParam for this here!
    trafo = function(x) paste0("dim", x)),
  makeNumericVectorParam("dim2.init", len = 2, lower = 0, upper = 1,
    requires = quote(selected.learner == 2)),
  makeNumericVectorParam("dim3.init", len = 3, lower = 0, upper = 1,
    requires = quote(selected.learner == 3)),
  makeNumericVectorParam("dim4.init", len = 4, lower = 0, upper = 1,
    requires = quote(selected.learner == 4)))
control = makeTuneControlMBO()
trs <- tuneParams(mmlearner, myTask, hout, par.set = par.set, control = control)

Problem here is you have a quadratically larger parameter space.

Btw, in my experience, tuning with mlrMBO has not handled "multiplexed" learners very well; if you use the approach given above you may want to try the "irace" tuner in mlr (makeTuneControlIrace).

[EDIT: apparently I had the parameter names wrong for your example]

mb706 commented

Bonus solution, very hacky, use at your own risk: Use trafo in a way it is not supposed to be used!
The trick here is to use a dummy parameter. This parameter should not influence the actual learner behaviour. .weights works if your learner does not have the "weights" property.

learner <- makeLearner("classif.fda.hmm")
learner$par.set <- c(learner$par.set, makeParamSet(makeIntegerLearnerParam("size")))
makeParamSet(
  makeIntegerParam(".weights", lower = 2, upper = 10,
    trafo = function(x) GLOBALSIZE <<- x),
  makeNumericVectorParam("init", len = 10, lower = 0, upper = 1,
    trafo = function(x) x[1:GLOBALSIZE]))
control = makeTuneControlMBO()
trs <- tuneParams(mmlearner, myTask, hout, par.set = par.set, control = control)

Wow! Many thanks for those 3 options, awesome help here.

Well, I have some bonus questions now:

  1. Regarding solution 1: The learner I am using is actually a custom learner, and it has a size param (an IntegerParam called n). So, if I understood correctly, what you propose is to give a maximum value for the size (e.g. n = 10) and then truncate the dependant parameters (the NumericVector called init) depending on the actual value of n...so should I set previously the length of init to 10?

  2. Regarding solution 2. I don't think it is a dirty solution, actually I did not know about the function makeModelMultiplexer and that has given me ideas for other things too ;). Only one question: Does makeTuneControlIrace accepts numeric vectors?

  3. Wow, how does that work? Is .weights a special keyword for all the learners?

At a first glance, I see option 2 as the most suitable, maybe because it is the one that I understand the most.

Thank you again!

mb706 commented

For 1., you would set the len of the vector parameter to the maximum length you are considering. Within the learner function you would then discard the dimensions. So somewhere in your learner function you could have something like init = init[1:n] (although see below). Your learner always gets called with the maximum number of elements in init, it just disregards some of them depending on n.

To the optimization algorithm there is a difference between using ModelMultiplexer and using a parameter to discard some of the dimensions: With ModelMultiplexer, you have a parameter dim2.init, and an unrelated parameter dim3.init, while with truncation you use the same parameter for all dimension cases. Your tuner may find, for example, that the specific vector c(A, B, C) works well for init with n == 3 (or dim3.init). If you just truncate the init vector within the learner function, then the vector c(A, B) will be preferred when n == 2 (i.e. the tuner doesn't know that C is not being used and just tries c(A, B, C)); if you use ModelMultiplexer and hence a different parameter for the n == 2 case, then the tuner will not be able to carry over any information. As long as there is some relationship between your outcome (model performance) and vector values even for different values of n you should prefer the truncation method. When truncating it might be useful to consider what part of the vector to truncate. Ask yourself, is the performance with init = c(A, B, C, D) generally more similar to init = c(A, B), or to init = c(C, D), or init = c(A, D), or something else (init = c(A + B, C + D)/2)?

You should be aware that optimising over a high dimensional parameter space is expensive, so if you have some ways of parametrising the most likely interesting values of init with fewer parameters you should probably do that.

You should not consider 3. if you have the option of writing the learner function; .weights is a special parameter name because it clashes with the internal parameter used for handling case weights in training data, which learners without the "weights" property ignore and therefore remains as a "free" dummy parameter.

Tuning with irace should work out of the box with mlr just as much as mbo does. (There is an issue that it has problems with requirements that depend on DiscreteParam that do not have exclusively character values, but most people don't use non-character-DiscreteParams).

Thanks @mb706 , nice explanation.

Two things about the truncation strategy:

  1. Assuming that the maximum value for n is 10, the initial design of the parameter init in MBO would always be 10-dimensional. right? Does it not affect to the performance of the optimization process?

  2. Another issue is that the values of vector init have a constraint, they must sum up 1 (it is a probability vector, see #451 ). Currently I am coding that as a trafo, but I don't know how to proceed if the vector has always the maximum length.

Many thanks for your help again!!

mb706 commented
  1. Exactly. Which method you choose to use will very likely affect optimization performance, it's a bit hard to say in which way.
  2. You could do the transformation inside the learner. I.e.
    init <- init[1:n]
    init <- normalise(init)
    (for whatever normalisation method you choose)

Ok, now I have a clearer picture of this issue. I'll report back in case of further related questions.

Thank you again!