Namespace issue with pyspark.ml and pyspark.mllib
ovlaere opened this issue · 2 comments
I tried to run the default example on the README page
from sklearn import svm, grid_search, datasets
from spark_sklearn import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
svr = svm.SVC()
clf = GridSearchCV(sc, svr, parameters)
clf.fit(iris.data, iris.target)
on Spark, but got the following error:
ImportError: No module named linalg
The code that causes this is the import of pyspark.ml.linalg
on this line in converter.py in spark_sklearn
We are running Spark 1.6, and according to the documentation, in 1.6 and above, linalg
is under pyspark.mllib.linalg
instead of pyspark.ml.linalg
.
I'm trying to figure out if it's an issue with my versions or what else exactly, given that the README mentions Spark 2.0 compatibility, but if this indeed an issue with spark_sklearn, it looks like this should be broken then since at least 1.6.0? Can someone confirm?
In case anyone else encounters this compatibility error with Spark 1.6.
Here is what fixed problem for me:
pip install spark-sklearn==0.1.2
Yes, the most recent versions require Spark 2.