Tuning n_estimators for RandomForests doesn't make sense

Question

Tuning n_estimators for RandomForests doesn't make sense

GaelVaroquaux opened this issue 3 years ago · 3 comments

In the following:
https://inria.github.io/scikit-learn-mooc/python_scripts/ensemble_hyperparameters.html
We tune n_estimators for RandomForest which doesn't make much sense. Typically: the more, the better.

Answer 1 · 2022-03-07T10:36:35.000Z

I realize that this is somewhat mentioned below, but not very clearly IMHO. I think that it should be clearly stated that tuning the number of estimators for RandomForests does not make much sense and is likely to be a loss of resources.

Answer 2 · 2022-03-16T09:07:24.000Z

But we don't have a smart API to explore efficiently how many trees are needed.

We could do a big fit and then do an a posteriori analysis of how many trees are actually necessary by subsampling the estimators_ attribute (and re-setting n_estimators for consistency) in a clone but this not standard scikit-learn code.

Answer 3 · 2022-03-16T09:45:01.000Z

The problem with this code is that it teaches people the wrong behavior. We already see too many people choosing n_estimators by cross-validation, which leads to 1) overfits, 2) waste of computer power. We would be better off removing the tuning of this parameter from the MOOC.