/Telecom-Customer-Churn

Final Project

Primary LanguageJupyter Notebook

Final-Project-Telecom-Customer-Churn

About Customer Churn

Customer Churn is the number or presentation of customers who stop using a product or unsubscribe during a certain period, this is caused by customer dissatisfaction, cheaper offers from competitors, better marketing by competitors, or other causes.

In a growing business, the cost of getting new customers is far greater than the cost of keeping existing customers. Customer churn impact on lossing revenue and company's reputation, in such a position it is more difficult to get new customers.


Objective

  1. Analyze the factors that are potential causes of customer churn from this dataset
  2. Build Machine Learning Model to predict customers churn

Dataset

The dataset provide by Kaggle to analyze and predict customer churning. The dataset is a sample data from IBM consists of 7043 samples and 21 columns with the following description:

  • customerID: Customer ID
  • gender: Whether the customer is a male or a female
  • SeniorCitizen: Whether the customer is a senior citizen or not (Yes, No)
  • Partner: Whether the customer has a partner or not (Yes, No)
  • Dependents: Whether the customer has dependents or not (Yes, No)
  • tenure: Number of months the customer has stayed with the company
  • PhoneService: Whether the customer has a phone service or not (Yes, No)
  • MultipleLines: Whether the customer has multiple lines or not (Yes, No, No phone service)
  • InternetService: Customer’s internet service provider (DSL, Fiber optic, No)
  • OnlineSecurity: Whether the customer has online security or not (Yes, No, No internet service)
  • OnlineBackup: Whether the customer has online backup or not (Yes, No, No internet service)
  • DeviceProtection: Whether the customer has device protection or not (Yes, No, No internet service)
  • TechSupport: Whether the customer has tech support or not (Yes, No, No internet service)
  • StreamingTV: Whether the customer has streaming TV or not (Yes, No, No internet service)
  • StreamingMovies: Whether the customer has streaming movies or not (Yes, No, No internet service)
  • Contract: The contract term of the customer (Month-to-month, One year, Two year)
  • PaperlessBilling: Whether the customer has paperless billing or not (Yes, No)
  • PaymentMethod: The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
  • MonthlyCharges: The amount charged to the customer monthly
  • TotalCharges: The total amount charged to the customer
  • Churn: Whether the customer churned or not (Yes or No)

Model Prediction

Build Supervised Machine Learning with 4 algorithm

  • Logisitic Regression
  • XGBoost Classifier
  • Random Forest
  • K-Nearest Neighbors methods.

Since the dataset is imbalance, I used 4 Experiment :

  • Training models use an Imbalance dataset
  • Training models use a balance dataset (Under Sample)
  • Training models use a balance dataset (Random Over Sample)
  • Training models use a balance dataset (SMOTE)

Best Model:

  • Because we used imbalance dataset, evaluation matrics based on Recall,F1 Score, and AUC Score
  • Logistic Regression who train used a balance dataset (Random Over Sample) get highest Recall, F1 Score and AUC Score

Model Interface

Home Page home

Analysis Analysis

Form Prediction Predict

Result result