📌 In this section, we will predict bank customer churn using pyspark machine learning algorithms.
📌 In order for the bank to serve more, we need to predict whether the customer will leave the bank, and we also need to make sure that the customers will not leave the bank.
📌 This dataset contains details of a banks customers and the target variable is a binary variable reflecting the fact whether the customer left the bank (closed his account) or he continues to be a customer.
The features in the given dataset are:
-
rownumber: Row Numbers from 1 to 10000.
-
customerid: A unique ID that identifies each customer.
-
surname: The customer’s surname.
-
creditscore: A credit score is a number between 300–850 that depicts a consumer's creditworthiness.
-
geography: The country from which the customer belongs to.
-
Gender: The customer’s gender: Male, Female
-
Age: The customer’s current age, in years, at the time of being customer.
-
tenure: The number of years for which the customer has been with the bank.
-
balance: Bank balance of the customer.
-
numofproducts: the number of bank products the customer is utilising.
-
h0ascrcard: The number of credit cards given to the customer by the bank.
-
isactivemember: Binary Flag for indicating if the client is active or not with the bank before the moment where the client exits the company (recorded in the variable "exited")
-
exited: Binary flag 1 if the customer closed account with bank and 0 if the customer is retained.