databricks/spark-sklearn

best_params_ missing on GridSearchCV

dklischies opened this issue · 3 comments

The best_params_ dict seems to be missing from GridSearchCV, even if refitting is enabled.
grid_search.py#L195 refers to that parameter, it is determined in grid_search.py#L371 but never actually exposed after fitting. This contradicts both, your own docs and the Scikit Learn 0.19.1 and 0.20.0 docs. Was this attribute purposefully not exposed or is this a bug? Depending on your answer, I'd be happy to provide a PR for a change of the docs or the code to resolve this.

See also #37 which uncovered this a year ago.

See #79 as a related issue. I think the issue is that the doc was just copied from scikit, but this subclass can't set the attributes. I think we need to pull them from the docs. Therefore it ends up being a matter of fixing the warnings in #37

I'm reopening this as there is clearly an issue here, though the 'quick fixes' don't work:

#37
#79
#84
#85

See in particular #79

It seems like the intent is to set these attributes for parity with scikit, but, they can't be set directly. There may be a simple answer; I don't know it yet. Ideas welcome here!

I'm on 19.2 of scikit learn and can't reproduce this issue when trying to add in best_params_ manually, at least not when running run_tests.sh.

Is there another way to reproduce the bug?

I tried by adding this to line 164 in base_search.py, then ran tests.

 self.best_params_ = results["params"][self.best_index_]