nested optimization for complex dependencies between params
vrodriguezf opened this issue · 7 comments
Hi,
I have a NumericVectorParam
to optimize (init
), whose length n
needs to be optimized too. Currently, the range of possible values for n
is low, so I am looping over all of them, merging the optimization paths afterwards.
n_range = 2:10
experiment.result = purrr::map(n_range, function (n) {
tune.result = tuneParams(
learner = "classif.fda.hmm",
task = myTask,
resampling = makeResampleDesc("Holdout"),
measures = list(acc),
par.set = makeParamSet(
n = makeIntegerParam(id = "n", lower = "n", upper = "n"),
init = makeNumericVectorParam(id = "init", len = n, lower = 0, upper = 1)
),
control = makeTuneControlMBO()
)
tune.result
})
Is it possible to perform a nested MBO process, so that the outer call optimizes the value of n
and the inner call optimizes the values of init
?
Many thanks for your help!!
Easiest solution I can think of is to write your own learner with all the arguments of your desired learner and an additional size
argument that truncates the n
parameter to the desired length. You'll have to read up on how to create custom learners for this.
A quick and dirty solution without a custom learner that I can think of would be to use ModelMultiplexer
. I haven't tested this, but to give you an idea:
# create a few learners to put into ModelMultiplexer
# all learners must have different IDs.
# I'm creating three learners explicitly to make it clear what is
# happening. For a large range of n you would use some form
# of loop to do this
lrn2 <- makeLearner("classif.fda.hmm", id = "dim2")
lrn3 <- makeLearner("classif.fda.hmm", id = "dim3")
lrn4 <- makeLearner("classif.fda.hmm", id = "dim4")
mmlearner <- makeModelMultiplexer(list(lrn2, lrn3, lrn4))
# inspect mmlearner to see what is happening here:
# the 'selected.learner' selects which
par.set <- makeParamSet(
makeIntegerParam("selected.learner", lower = 2, upper = 4, # don't use a DiscreteParam for this here!
trafo = function(x) paste0("dim", x)),
makeNumericVectorParam("dim2.init", len = 2, lower = 0, upper = 1,
requires = quote(selected.learner == 2)),
makeNumericVectorParam("dim3.init", len = 3, lower = 0, upper = 1,
requires = quote(selected.learner == 3)),
makeNumericVectorParam("dim4.init", len = 4, lower = 0, upper = 1,
requires = quote(selected.learner == 4)))
control = makeTuneControlMBO()
trs <- tuneParams(mmlearner, myTask, hout, par.set = par.set, control = control)
Problem here is you have a quadratically larger parameter space.
Btw, in my experience, tuning with mlrMBO has not handled "multiplexed" learners very well; if you use the approach given above you may want to try the "irace" tuner in mlr (makeTuneControlIrace
).
[EDIT: apparently I had the parameter names wrong for your example]
Bonus solution, very hacky, use at your own risk: Use trafo
in a way it is not supposed to be used!
The trick here is to use a dummy parameter. This parameter should not influence the actual learner behaviour. .weights
works if your learner does not have the "weights"
property.
learner <- makeLearner("classif.fda.hmm")
learner$par.set <- c(learner$par.set, makeParamSet(makeIntegerLearnerParam("size")))
makeParamSet(
makeIntegerParam(".weights", lower = 2, upper = 10,
trafo = function(x) GLOBALSIZE <<- x),
makeNumericVectorParam("init", len = 10, lower = 0, upper = 1,
trafo = function(x) x[1:GLOBALSIZE]))
control = makeTuneControlMBO()
trs <- tuneParams(mmlearner, myTask, hout, par.set = par.set, control = control)
Wow! Many thanks for those 3 options, awesome help here.
Well, I have some bonus questions now:
-
Regarding solution 1: The learner I am using is actually a custom learner, and it has a
size
param (anIntegerParam
calledn
). So, if I understood correctly, what you propose is to give a maximum value for the size (e.g.n = 10
) and then truncate the dependant parameters (theNumericVector
calledinit
) depending on the actual value ofn
...so should I set previously the length ofinit
to 10? -
Regarding solution 2. I don't think it is a dirty solution, actually I did not know about the function
makeModelMultiplexer
and that has given me ideas for other things too ;). Only one question: DoesmakeTuneControlIrace
accepts numeric vectors? -
Wow, how does that work? Is
.weights
a special keyword for all the learners?
At a first glance, I see option 2 as the most suitable, maybe because it is the one that I understand the most.
Thank you again!
For 1., you would set the len
of the vector parameter to the maximum length you are considering. Within the learner function you would then discard the dimensions. So somewhere in your learner function you could have something like init = init[1:n]
(although see below). Your learner always gets called with the maximum number of elements in init
, it just disregards some of them depending on n
.
To the optimization algorithm there is a difference between using ModelMultiplexer
and using a parameter to discard some of the dimensions: With ModelMultiplexer
, you have a parameter dim2.init
, and an unrelated parameter dim3.init
, while with truncation you use the same parameter for all dimension cases. Your tuner may find, for example, that the specific vector c(A, B, C)
works well for init
with n == 3
(or dim3.init
). If you just truncate the init
vector within the learner function, then the vector c(A, B)
will be preferred when n == 2
(i.e. the tuner doesn't know that C
is not being used and just tries c(A, B, C)
); if you use ModelMultiplexer
and hence a different parameter for the n == 2
case, then the tuner will not be able to carry over any information. As long as there is some relationship between your outcome (model performance) and vector values even for different values of n
you should prefer the truncation method. When truncating it might be useful to consider what part of the vector to truncate. Ask yourself, is the performance with init = c(A, B, C, D)
generally more similar to init = c(A, B)
, or to init = c(C, D)
, or init = c(A, D)
, or something else (init = c(A + B, C + D)/2
)?
You should be aware that optimising over a high dimensional parameter space is expensive, so if you have some ways of parametrising the most likely interesting values of init
with fewer parameters you should probably do that.
You should not consider 3. if you have the option of writing the learner function; .weights
is a special parameter name because it clashes with the internal parameter used for handling case weights in training data, which learners without the "weights"
property ignore and therefore remains as a "free" dummy parameter.
Tuning with irace
should work out of the box with mlr
just as much as mbo
does. (There is an issue that it has problems with requirements that depend on DiscreteParam
that do not have exclusively character
values, but most people don't use non-character
-DiscreteParam
s).
Thanks @mb706 , nice explanation.
Two things about the truncation strategy:
-
Assuming that the maximum value for
n
is 10, the initial design of the parameterinit
in MBO would always be 10-dimensional. right? Does it not affect to the performance of the optimization process? -
Another issue is that the values of vector
init
have a constraint, they must sum up 1 (it is a probability vector, see #451 ). Currently I am coding that as atrafo
, but I don't know how to proceed if the vector has always the maximum length.
Many thanks for your help again!!
- Exactly. Which method you choose to use will very likely affect optimization performance, it's a bit hard to say in which way.
- You could do the transformation inside the learner. I.e.
(for whatever normalisation method you choose)
init <- init[1:n] init <- normalise(init)
Ok, now I have a clearer picture of this issue. I'll report back in case of further related questions.
Thank you again!