mlr-org/ParamHelpers

Discrete Parameters with Integer Levels

Opened this issue · 3 comments

Please compare the type of the sampled values of bla. Using sampleValue we get an Integer, via generateDesign we get a Factor. Is this wanted? It is - at least - strange?

library(ParamHelpers)

par.set = makeParamSet(
  makeDiscreteParam("bla", values = 1:5)
)

str(sampleValue(par.set))
str(generateDesign(n = 1, par.set))

After 5 minutes more thinking: It probably is wanted, and, actually, it is well documented. However, a second opinion would be nice.

?generateDesign

#' The following types of columns are created:
#' \tabular{ll}{
#'  numeric(vector)   \tab  \code{numeric}  \cr
#'  integer(vector)   \tab  \code{integer}  \cr
******* #'  discrete(vector)  \tab  \code{factor} (names of values = levels) \cr **********
#'  logical(vector)   \tab  \code{logical}
#' }

so this is "intended" and documented.

  1. the reason is: "values" can any complex object, but I have to fit the vals into a df.
    our convention is therefore: for dfs (and not param values represented in lists), we use the
    encoding as documented in 1).

  2. please read convertDiscrete.R now.

  3. "sample " behaves differently, because it outputs the list representation, so here we use the "value" not the "category" name.

  4. for generated dfs, you nearly always call dfRowsToList later, which converts then "category names" to "values" (and you get your integers)

  5. now, when "values" are simple scalars you might want to have them in the "correct" type in the generated df (here an int col). i thought about this, and yes, you might sometimes want this.
    if you want this too, pls help to create a PR.

Thanks, I didn't thought about more complex parameter types. Strange on the first thought, but meaningful on the second and third one.

I am using a random search as a baseline method for MBO. So my code looked something like this:

  1. Generate an initial design via generateDesign. This design is used by both the MBO algorithms and the random search
  2. For the random search, sample all remaining points via sampleValues.
  3. Add all points from 1) and 2) to an opt.path - which resulted in an error.

My mistake was: I used convertRowsToList instead of dfRowsToList. Learned my lesson, everything is fine. Perhaps we could mention dfRowsToList in the documentation of generateDesign.

mention dfRowsToList in the documentation of generateDesign.

good idea, maybe use it in an example?