/K-Means-in-Spark

K-Means Implementation in Pyspark

Primary LanguageJupyter Notebook

K-Means-in-Spark

K-Means Implementation in Pyspark

This notebook demonstrates how to perform K-Means in Pyspark dataframes. This requires changing the dataframe's original columns to columns acceptable by ML library of Spark. Different number of clusters (K) were tried and best K was decided on the basis of Silhouette Score.