How to import spark-stemming via pyspark
GeorgesAlkhouri opened this issue · 2 comments
Hello,
I want to try your stemming package for Spark and included the package to my spark-submit command.
./spark-submit --packages master:spark-stemming:0.1.1 run.py
But when I want to import the Stemmer via pyspark it cannot be found.
I tried to import it like this
from pyspark.mllib.feature import Stemmer
and this
from pyspark.ml.feature import Stemmer
Currently, I am using Spark version 2.0.0.
Thanks
Hi @GeorgesAlkhouri - this package is a Scala package, so it is not possible to import it directly. Instead, you (or someone else) would need to write a Python wrapper. For example, in Spark, the Tokenizer
class is written in Scala but a Python wrapper is then provided to allow importing from pyspark.ml.feature
.
Notably, if you do add a wrapper, you'd then have to import your wrapper from where you wrote it, since it would not be in the pyspark.ml.feature
module.
Got it, thanks.