These are some pyspark demonstrations for NLP purposes.
The dataset used for each model is collected from Professor Julian McAuley's Amazon product dataset. This specific subset is titled "Cell Phones and Accessories".
Serialized form of trained pyspark models and pipelines
Resulting metrics after training models
Contains a series of files demonstrating text classification with Apache Spark using Amazon product reviews.
Contains files for demonstrating collaborative filtering on text classification.
Contains helper functions for training models and loading data.