master/spark-stemming

How to import spark-stemming via pyspark

GeorgesAlkhouri opened this issue · 2 comments

Hello,

I want to try your stemming package for Spark and included the package to my spark-submit command.

./spark-submit --packages master:spark-stemming:0.1.1 run.py

But when I want to import the Stemmer via pyspark it cannot be found.

I tried to import it like this

from pyspark.mllib.feature import Stemmer

and this

from pyspark.ml.feature import Stemmer

Currently, I am using Spark version 2.0.0.

Thanks

Hi @GeorgesAlkhouri - this package is a Scala package, so it is not possible to import it directly. Instead, you (or someone else) would need to write a Python wrapper. For example, in Spark, the Tokenizer class is written in Scala but a Python wrapper is then provided to allow importing from pyspark.ml.feature.

Notably, if you do add a wrapper, you'd then have to import your wrapper from where you wrote it, since it would not be in the pyspark.ml.feature module.

Got it, thanks.