Topic: Customer Segmentation
Method: Data Wrangling + RFM
Tools: R + Tableau
Data source: Online Retail on Kaggle
Customer segmentation is an important task in Marketing and Sales since it guides business to market the right products to the right customer group. To do customer segmentation, there are machine learning methods such as K-Nearest Neighbor in supervised learning, or K-Means clustering in unsupervised learning. Another method that is widely used in Database Marketing and Direct Marketing, but does not relate to Machine Learning/Artificial Intelligence is RFM, standing for Recency - Frequency - Monetary Value. In this project, I want to show step by step how data wrangling in R can help marketers to do customer segmentation using RFM method.
Many marketers nowadays use Tableau to visualize data due to its interactive, user-friendly interface and no coding background requirement. Visualizations on Tableau can be turned to informative reports for different stakeholders with different backgrounds. Thus, after getting the customer segments data, I will build Tableau dashboards to visualize these segments' value to business.
In this project, I work with the Online Retail data.
There are 240,007 observations of 8 variables as follow:
Variable name | Description | Data type |
---|---|---|
InvoiceNo | Invoice number, with 12,468 unique values | character |
StockCode | Stock code, with 3,645 unique values | character |
Description | Product name, with 3,606 unique values | character |
Quantity | Number or product(s) ordered | integer |
InvoiceDate | Date and time that an order was placed, starting from December 2010 to June 2011 | character |
UnitPrice | Price per unit of the product ordered | double |
CustomerID | Unique identifier for each customer, with 2,975 unique values | character |
Country | Country that the order was placed, with 38 unique values | character |
- Step by step instruction for data wrangling in R
- Tableau Dashboard on Tableau Public. Since Tableau does not allow to upload full data due to its size, I only visualize data from December 2010 to April 2011.
|- data
| -- raw Includes raw data
| -- clean Includes clean data
|- doc Includes step-by-step instruction and other documentary
|- src Includes source codes for this project
|- LICENSE MIT License
|- online-retail.Rproj R project
|- README.md Project Overview
- This customer segmentation is based solely on customer transactions. If we have data on customer demographic (ie. gender, age, location, yearly salary), we might discover more interesting insights.
- Not all customers are segmented due to a lack of
CustomerID
and purchase information. Some customers only have return transactions, but I need both purchase and return data to segment customers.
I want to try unsupervised learning kmeans
on this data to see if this method segments customer similarly or differently from RFM.