/PySpark-Machine-Learning

A collection of machine learning examples using PySpark

Primary LanguageJupyter Notebook

PySpark-Machine-Learning

A collection of machine learning examples using PySpark

To clone and run the tutorials, please install Anaconda Python with pyspark and other needed packages.

Basic Spark

Quick through the basic, I recommend a online course from Udemy and this great GitHub Repo.

Natural Language Processing with PySpark

Watch as John Hogue walks through a practical example of a data pipeline to feed textual data for tagging with PySpark and ML. Learn to leverage great existing Python libraries in Spark such as NLTK and how to use some of Spark’s newer features. A GitHub Repo of source code, training and test sets of data will be provided for attendees to explore and play with.