databricks/spark-sklearn

Scikit >=20.0 support

yishilin14 opened this issue · 5 comments

Is there any plan to support scikit-learn >=20.0?

I doubt it, or at least, I don't think I can make the change myself. If there's a good PR that fixes it I'll review it.

set92 commented

@srowen that means that this library is deprecated? Because I suppose it means that it will not be updated anymore (except someone else comes and update it but it will not receive other kind of updates).

And there is some other good option to parallelize scripts of scikit-learn?

There aren't likely to be major updates here, but I wouldn't say deprecated. I'm not aware of anyone actively working on it. Within Databricks, the recommended parameter tuning framework going forward is based on hyperopt, which would work with scikit-learn (and many other things) without trying to integrate tightly into it. https://databricks.com/blog/2019/06/07/hyperparameter-tuning-with-mlflow-apache-spark-mllib-and-hyperopt.html

Hi @srowen , any advice on how to install this package w/ pip now that there are no longer any compatible versions of scikit-learn? The oldest scikit-learn through pip is 0.19.2, which spark-sklearn seems to be incompatible with.

0.19.2 should work OK; what are you seeing?