IBM-Tele-Customer-Churn: A Python repository from kartikdman

IBM Telecom Customer Churn ✅

The Telecom Customer Churn Prediction is a machine learning project that aims to predict the likelihood of customers leaving a telecom company. The project is developed using Python programming language and machine learning libraries such as scikit-learn, pandas, and numpy.

Problem Statement 😬

The telecom industry is highly competitive, and retaining customers is one of the main challenges. Customer churn is a major problem for telecom companies as it impacts their revenue and growth. Therefore, predicting customer churn is essential for telecom companies to take necessary actions and prevent customers from leaving.

Dataset 😏

This dataset is from IBM and contains information on customers of a telecommunications company. The goal of the dataset is to predict which customers are likely to churn, or leave the company, based on their demographic information, services they have subscribed to, and their account information.

Content

The dataset contains the following columns:

customerID: unique customer identifier
Count: variable to check number of records, should always be 1
Country: country of customer
State: state of customer
City: city of customer
Zip Code: zip code of customer
Lat Long: latitude and longitude of customer
Gender: gender of customer
Senior Citizen: whether the customer is a senior citizen (yes or no)
Partner: whether the customer has a partner (yes or no)
Dependents: whether the customer has dependents (yes or no)
Tenure Months: number of months the customer has been with the company
Phone Service: whether the customer has phone service (yes or no)
Multiple Lines: whether the customer has multiple lines (yes, no, or no phone service)
Internet Service: type of internet service (DSL, Fiber optic, or no internet service)
Online Security: whether the customer has online security (yes, no, or no internet service)
Online Backup: whether the customer has online backup (yes, no, or no internet service)
Device Protection: whether the customer has device protection (yes, no, or no internet service)
Tech Support: whether the customer has tech support (yes, no, or no internet service)
Streaming TV: whether the customer has streaming TV (yes, no, or no internet service)
Streaming Movies: whether the customer has streaming movies (yes, no, or no internet service)
Contract: type of contract (month-to-month, one year, or two year)
Paperless Billing: whether the customer has paperless billing (yes or no)
Payment Method: payment method (bank transfer, credit card, electronic check, mailed check)
Monthly Charges: amount charged to the customer monthly
Total Charges: total amount charged to the customer
Churn Value: whether the customer churned (yes or no)
Churn Score: probability of customer churning, based on a model
CLTV: customer lifetime value
Churn Reason: reason the customer churned, if applicable

This dataset was downloaded from Kaggle (https://www.kaggle.com/yeanzc/telco-customer-churn-ibm-dataset). The data was originally sourced from IBM.

Methodology 🧠

The project follows a standard machine learning pipeline, including data cleaning, preprocessing, feature engineering, model selection, and evaluation. The following steps are involved in the project:

Data Cleaning: The dataset is cleaned by removing missing values and duplicates.
Exploratory Data Analysis (EDA): EDA is performed to understand the relationships between variables and identify patterns.
Preprocessing: The dataset is preprocessed by converting categorical variables into numerical values using encoding techniques.
Feature Selection: Selecting the features
Model Selection: Several machine learning models are trained and evaluated to select the best model for prediction.
Evaluation: The performance of the selected model is evaluated using various evaluation metrics such as accuracy, precision, recall, and F1 score.

Machine Learning Models

The following machine learning models are used in this project:

Logistic Regression
Decision Tree
Random Forest
Navie Bayes

Requirements

The following libraries are required to run the project:

Python 3.7 or higher
scikit-learn
pandas
numpy
matplotlib
seaborn

For Using this project just run the following command 👉

pip install -r requirements.txt

kartikdman/IBM-Tele-Customer-Churn