/idxpartners_casestudy

rakamin academy x idx/partners pbi case study

Primary LanguageJupyter Notebook

Case Study

The Implementation of Competency in Company.

ID/X Partners - Data Science


Outline

  • Clustering (Customer segmentation)
  • Charts and Tables
  • Insight Extraction

Dataset can be downloaded here.



Clustering

Problem Statement

  • The problem: A bank wants to segment its customers into different groups based on their spending habits and demographics. This will help the bank to target marketing campaigns more effectively and to understand how to better serve its customers
  • Idea: The bank can use clustering algorithms to group customers together based on their similarities. The clustering algorithm will identify the features that are most important for differentiating between customers, such as the amount of money they spend, the types of products they buy, and their age and location.
  • Assignment: Please help implementing a clustering algorithm to segment the bank's customers. need to choose the right clustering algorithm for the data, tune the hyperparameters of the algorithm, and evaluate the results of the clustering.

Solution

  1. Import the necessary libraries.
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
  1. Load the customer data.
data = pd.read_csv("customer_data.csv")
  1. Select the features that you want to use for clustering.
features = ["balance", "age", "campaign"]
  1. Create the K-Means model.
kmeans = KMeans(n_clusters=5)
  1. Fit the K-Means model to the data.
kmeans.fit(data[features])
  1. Predict the cluster labels for each customer.
labels = kmeans.predict(data[features])
  1. Visualize the clusters.
import matplotlib.pyplot as plt
plt.scatter(data["balance"], data["age"], c=labels)
plt.show()



Charts and Tables

Problem Statement

  • The problem : A bank wants to understand their customers better so that they can create more effective marketing campaigns. They have a large dataset of customer information, including demographics, financial data, and past marketing interactions.
  • Idea : The bank can use charts and tables to visualize the customer data and identify patterns. This can help them to understand how different customer segments behave and what factors influence their purchasing decisions.
  • Assignment: Please help creating charts and tables to visualize the customer data. They will need to choose the right charts and tables for the data and to format the data in a way that is easy to understand.

Solution

  1. Import the necessary libraries.
import pandas as pd
import matplotlib.pyplot as plt
  1. Load the customer data.
data = pd.read_csv("customer_data.csv")
  1. Create a bar chart of the number of customers in each age group.
plt.bar(data["age"].unique(), data["age"].value_counts())
plt.xlabel("Age")
plt.ylabel("Number of customers")
plt.title("Number of customers in each age group")
plt.show()

  1. Create a pie chart of the number of customers who responded to the marketing campaign.
plt.pie(data["previous"].value_counts(), labels=data["previous"].unique(), autopct="%1.1f%%")
plt.title("Percentage of customers who responded to the marketing campaign")
plt.show()

  1. Create a scatter plot of the customer's age and balance.
plt.scatter(data["age"], data["balance"])
plt.xlabel("Age")
plt.ylabel("Income")
plt.title("Customer age vs. balance")
plt.show()



Insight Extraction

Problem Statement

  • The problem : A bank wants to improve their marketing campaigns by understanding the factors that influence whether or not a customer will subscribe to a term deposit. They have a dataset of past marketing campaigns, including information about the customers who were contacted, the marketing materials that were used, and whether or not the customer subscribed to a term deposit.
  • Idea : The bank can use insight extraction techniques to identify the factors that are most strongly correlated with customer subscription. This can help them to create more effective marketing campaigns that target the right customers with the right messages.
  • Assignment: Please help using insight extraction techniques to identify the factors that are most strongly correlated with customer subscription. They will need to choose the right techniques for the data and to interpret the results of the analysis.

Solution

  1. Import the necessary libraries.
import pandas as pd
import numpy as np
  1. Load the customer data.
data = pd.read_csv("customer_data.csv")
  1. Calculate the correlation coefficient between each pair of features.
correlation = data[['age','balance','day','duration','campaign','pdays','previous']].corr()
  1. Identify the features with the strongest correlation with customer subscription.
most_correlated_features = correlation[‘previous’].nlargest(5)
  1. Interpret the results of the analysis.
print("The features that are most strongly correlated with customer subscription are:")
print(most_correlated_features)