catboost/catboost

Caret object: Inconsistent grid creation with documentation

Rek27 opened this issue · 3 comments

Problem: According to the documentation, Tree depth hyperparameter should be 4-10 (optimal). For CPU, this hyperparameter can be any integer up to 16. Problem comes when looking at the function in catboost.caret that is generating the grid. It depends on the tuneLength which means, if someone does random search with tuneLength > 16, they will get NaN as the metric value (in my case Accuracy).

catboost.caret

...
$grid
function (x, y, len = 5, search = "grid")
{
if (search == "grid") {
grid <- expand.grid(depth = c(2, 4, 6), learning_rate = exp(-(0:len)),
iterations = 100, l2_leaf_reg = 1e-06, rsm = 0.9,
border_count = 255)
}
else {
grid <- data.frame(depth = sample.int(len, len, replace = TRUE),
learning_rate = runif(len, min = 1e-06, max = 1),
iterations = rep(100, len), l2_leaf_reg = sample(c(0.1,
0.001, 1e-06), len, replace = TRUE), rsm = sample(c(1,
0.9, 0.8, 0.7), len, replace = TRUE), border_count = sample(c(255),
len, replace = TRUE))
}
return(grid)
}
...

Shouldn't the grid be limited to 16 most? Not really to depend on the tuneLength.

catboost version: 1.2.2
Operating System: Windows 10 x64
CPU: AMD Ryzen 5 PRO 5650U
GPU: not using

Many thanks for paying attention!
The fix is on its way to github.com/catboost ...

My excuses for mixing up issues 2609 <-> 2606 :-\