Insiders Clustering


loyalty_program


Summary



0. Business Problem


All In One Place is a geral e-commerce that sells different products. Based on this scenario, the marketing team needs to have a deep understanding of all customers, the team needs to screen potential customers for a loyalty program based on clients purchases habit.

0.1. What is a Loyalty Program

A loyalty program in the context of e-commerce is a marketing strategy implemented by online retailers to reward and encourage repeat customers. The primary goal of a loyalty program is to foster customer loyalty and increase customer retention. By offering various incentives and benefits, e-commerce businesses aim to create a sense of loyalty and appreciation among their regular shoppers.

Have some another Loyalty program on a company, such as:

  1. Enrollment: Customers are invited to join the loyalty program either during the checkout process or by signing up separately on the website or app.
  2. Personalized Offers: Advanced loyalty programs may use customer data and behavior to tailor personalized offers and recommendations, which can further enhance the customer experience.
  3. Accumulating Points or Rewards: Once enrolled, customers earn points or rewards for specific actions such as making purchases, referring friends, writing product reviews, or engaging with the brand on social media.
  4. Redemption: Customers can then redeem their accumulated points or rewards for discounts, coupons, free products, cashback, or other exclusive offers. The rewards are intended to provide added value and encourage customers to keep coming back to the e-commerce platform.

But, for all in one place, the main objective is to find key customers with high frequency, monetization and basket size for fidelity program, exists some ways for this process, like RFM Model, is a good baseline for fast solution for give a good value for business.

The Objective is find customers to make a insiders group (fidelity program name) in this company, the ds team need to answer another business question for better explain of this group

  1. What is the Quantity of clients in insiders ?
  2. What is principal characteristics of this clients ?
  3. What is percentage of invoice of group ?
  4. What is the proft forecast from this group ?
  5. What is the conditions of new client to join and left from insiders group ?
  6. What is the garantee of inserders group profit is better than another clients ?
  7. What actions the marketing team need to do for insiders group ?

1. Solution Strategy and Assumptions Resume


1.1. Geral Project Overview

The geral workflow is in this image below, simply i query some features of all users or a sample of them from sql server, apply clustering techniques for find customers behaviors and special / analytical indicatives and classify this users on groups (for loyalty program and anoter groups), for insiders people cluster i make a sales forecast to have an indication of how this group will perform in the future in terms of monetization.

image

The Data is storange on MongoDB Document and collected via Metabase for geral metric and follow-up cluster dashboard. In video below, is the Metabase dashboard for clustering results review.

insiders.mp4

2. Exploratory Data Analysis


I divide the EDA into two main steps, the business hypothesis validation and cluster profile & analysis.

2.1. Top 3 Business Hypothesis Validation

1. The customers of the cluster insiders have a purchase volume (revenue) above 15% of the total purchases.

image

2. The customers of the cluster insiders have a purchase volume (items) above 15% of the total purchases.

image

3. The customers of the cluster insiders have a purchase frequency greater than 50% in each month.s.

image

2.2. Cluster Insiders Overview

The Most Userful Information for the cluster insiders is this simple four lines below, its give some base and good information about users in insiders cluster, the company have 551 good clients given a total of: 5700.

  1. Number of Customers: 551;
  2. Average Revenue: BRL 9354.00;
  3. Average Recency: 50;
  4. Average Purchases: 13;

In Pyspark i get another cool cluster

  1. Number of Customers: 409;
  2. Average Revenue: BRL 7255.40;
  3. Average Recency: 49.36;
  4. Average Quantity Itens: 4201.97;
  5. Average Itens Return: *181.92*;

In this table below, is information about all results of clustering process.

image

Using Pyspark and another tools, i get some good clusters too using GMM, is possible to see, GMM crating very cool clusters (Not like kmeans) in this embeddings space.

Cluster Dataframe Metrics:

image

GMM Gaussians:

image

3. Data Preparation


On first cycle, i try use some computed features for find key users and MinMaxScaler, but the silhouette score (by key metric for clustering) is very low, based on this low metric I have transformed the feature space for a Embedding space to up the silhouette score metric and used very simple computed metric (recency, qnty_itens and frequency) on feature engineering.

I have finded very aggressive outliers in this dataset, for example please, checkout pyspark notebooks descriptive statistics, but all of then i have removed from Dataframe based on my assumptions for this project.

For my last try, I have get with DBSCAN silhouette score aprox 0.68, but i not find a insiders cluster with this solution, I have chosed Gausain Mixture for clustering, with GMM i get a metric with aprox 0.480 with optuna optimization for params, a little low based on DBSCAN metric, but with GMM is possible to find a good insiders clsuter with very different characteristics of other finded groups.

3.1. Embedding Clients Space

In the image below it is possible to observe the customers in their respective groups represented by colors.

image

This is the insiders cluster a little distanced from the others with the best characteristics obtained, these are the most valuable customers found with the clustering methods carried out in the project to find ops customers for the loyalty program.

I could explore these 551 detected users a little more, probably there should be more groups within this same group, so it is possible to find the most valuable of the valuable and deliver them to the marketing team and the other teams in the presentation of results.

4. Machine Learning Metrics


I have made a package called cluster-ss for my auxliar clustering project, the package in this link: https://pypi.org/project/cluster-ss/

With this package i run all sklearn clustering disponible solutions and its give to me a good overview of my chosed metric (sihlouette score).

image

For best params with optuna search, I chosed 9 K Clusters (9 different groups of clients for business), its a good number, not much large and i think is a good start for solution for this problem now with clustering machine learning meethods.

And for sales forecasting, i have used FB Prophet, is a classic and good solution on my perspective, have another popular solution like statsforecast, but i have chased FB instead.

image

Using Spark I have selected Hyperopt for fine tuning to get best clusters based on Silhouette score, is little small, but i can get business cool clusters.