/spark_projects

Some Projects Built with Apache Spark (pyspark specifically)

Primary LanguagePython

About

These are some pyspark demonstrations for NLP purposes.

The dataset used for each model is collected from Professor Julian McAuley's Amazon product dataset. This specific subset is titled "Cell Phones and Accessories".

Files and Directories

/models

Serialized form of trained pyspark models and pipelines

/metrics

Resulting metrics after training models

/classification

Contains a series of files demonstrating text classification with Apache Spark using Amazon product reviews.

/collaborative_filtering

Contains files for demonstrating collaborative filtering on text classification.

helper_functions.py

Contains helper functions for training models and loading data.