/CustomerSegmentationWithSpark

Customer segmentation using RFM Analysis and K-means clustering in Apache Spark

Primary LanguageScala

Model to Segment customers based on their previous purchase history. We use the RFM Analysis and K-means clustering techniques to cluster similar customers and use Apache Spark to code our model.The dataset used here is the Ecommerce data provided in kaggle: https://www.kaggle.com/carrie1/ecommerce-data/ . One issue i faced with the dataset is that the date format provided for the InvoiceDate column could not be processed in spark, i was not sure it was due my environment issue so i used a python script(provided in this repo) to reformat the InvoiceDate.