Final Project as a requirement for completing the Job Connector Data Science in Purwadhika Startup & Coding School Jakarta Batch 09.
RFM is a method used for analyzing customer value
Original Data Set
taken from : https://www.kaggle.com/mashlyn/online-retail-ii-uci
This dataset
have 8 columns and 1067371 rows
You've got to start with the customer experience and work back toward the technology - not the other way around.
-Steve Jobs.
-
Does it matter knowing what kind of products that they love ?
-
When the best time to give an emphaty tou your customer ?
-
How do you get in touch with your Customer?
- InvoiceNo: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'c', it indicates a cancellation.
- StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product.
- Description: Product (item) name. Nominal.
- Quantity: The quantities of each product (item) per transaction. Numeric.
- InvoiceDate: Invice date and time. Numeric. The day and time when a transaction was generated.
- UnitPrice: Unit price. Numeric. Product price per unit in poundsterling.
- CustomerID: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer.
- Country: Country name. Nominal. The name of the country where a customer resides.
A. Cleaning
-
Checking The Null Values and checking the relatable data through each rows using
msno
-
Drop The Duplicated Values, and keep the first rows where data has represented by duplicates.
-
Remove the minus values on Price and Quantity
-
Imputing missing values on
CustomerID
which have been overlap checking in Invoice -
Imputing missing values on
Description
which have been overlaping withStockCode
B. Enhancing Features to Get Time Series Features/Columns
- We have an
InvoiceDate
disparting its features to get a month, year, hour, days, and week.
Get an insights from the data where have a story behind.
Those clustering based on goal oriented by Elbow-Methods, Silhoutte Score, DaviesBouldin Score then we have 5 clustering are representing by distribution of labelling to each customer.
Recommender System
Product Bundling
The best marketing doesn't feel like marketing - Tom Fishburne