ja-thomas/OMLbots

n does not equal n in resampling

Opened this issue · 1 comments

Some hyperparameters are set accroding to the number of observations (actually only min.node.size)

n = nrow(task$task$input$data.set$data)
par$min.node.size = round(2^(log(n, 2) * par$min.node.size))

This does not respect the resampling. If we have 10fold CV it should be e.g.

n = nrow(task$task$input$data.set$data)/10
par$min.node.size = round(2^(log(n, 2) * par$min.node.size)) 

theoretically all performances for min.node.size in [log(n/10)/log(n),1] should be the same because the min.node.size is set to a value bigger then the actual n

Thanks for the note. Not sure if we're still fixing this.

We're currently in the process of writing a new version of the bot here: https://github.com/pfistfl/OMLRandomBotv2

Which means, that this version will (hopefully) soon be deprecated.