INRIA/scikit-learn-mooc

Tuning n_estimators for RandomForests doesn't make sense

GaelVaroquaux opened this issue · 3 comments

In the following:
https://inria.github.io/scikit-learn-mooc/python_scripts/ensemble_hyperparameters.html
We tune n_estimators for RandomForest which doesn't make much sense. Typically: the more, the better.

I realize that this is somewhat mentioned below, but not very clearly IMHO. I think that it should be clearly stated that tuning the number of estimators for RandomForests does not make much sense and is likely to be a loss of resources.

But we don't have a smart API to explore efficiently how many trees are needed.

We could do a big fit and then do an a posteriori analysis of how many trees are actually necessary by subsampling the estimators_ attribute (and re-setting n_estimators for consistency) in a clone but this not standard scikit-learn code.