Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)
SparklingML's goal is to expose additional machine learning stages for Spark with the pipeline interface.
Super early! Come join!
Dev mailing list: https://groups.google.com/forum/#!forum/sparklingml-dev
Sparkling ML consists of two components, a Python component and a Java/Scala component. The Python component depends on having the Java/Scala component pre-build which can be done by running ./build/sbt package
.
The Python component depends on the package listed in requirements.txt (as well as part of setup.py). Development and testing also requires spacy, nose, codecov, pylint, and pep8.
The script build_and_package.sh
builds & tests both the Scala and Python code.
Are your DocTests failing with
Expected nothing Got: Warning: no model found for 'en' Only loading the 'en' tokenizer.
Make sure you've installed spacy & the en language pack (python -m spacy download en
)
SparklingML is not yet ready for production use.
SparklingML is licensed under the Apache 2 license. Some additional components may be under a different license.