This notebook demonstrates how to perform K-Means in Pyspark dataframes. This requires changing the dataframe's original columns to columns acceptable by ML library of Spark. Different number of clusters (K) were tried and best K was decided on the basis of Silhouette Score.