Why get_incumbent_id selects the incumbent only from runs with max_budget?
2533245542 opened this issue · 1 comments
Isn't it weird that get_incumbent_id()
only finds the incumbent based on the losses from max_budget runs?
I think the incumbent should be selected from all configs, regardless of budgets.
Let's say we use epoch as budget.
If
config A has a loss of 5 after running 20 epochs.
config B has a loss of 3 after running 10 epochs.
get_incumbent_id()
will say config A is the incumbent. Then the user will build a model with config A using 20 epochs, but in fact, the user should build the model with config B, because config A is overfitting with 20 epochs.
I also suggest something like get_incumbent_id_and_budget()
so users can know the optimal hyperparameter value combination as well as the optimal budget for building their model.
The answer to your question depends on your problem. The assumption in this implementation is that a larger budget is more reliable, and that the goal is to optimize for the largest budget. The model will at some point disregard evaluations on the smaller budgets and only focus on the largest one. This is why the default behavior is the one you observed.
You might use BOHB to tune a neural network, where larger budget does not necessarily mean better performance, i.e. early stopping boosts performance. But if you consider a problem where the noise depends on the budget, e.g. the number of CV folds, things are different. There there a higher budget equals a higher fidelity of the evaluation, meaning that it is more trustworthy.
You can use get_incumbent_trajectory
to get your desired behavior.
There are the two flag to this function called bigger_is_better
and 'non_decreasing_budget'. If you set both to False
, it will give you a dict, where the last entry should be config with the best loss ever seen.
Please feel free to implement get_incumbent_id_and_budget()
and open a PR.