Welcome to the Telco Customer Churn Prediction project. This project is dedicated to predicting customer churn in the telecommunications industry, a vital task for businesses aiming to retain their customers and make data-driven decisions to reduce attrition rates.
The dataset used in this project can be found here. It contains information about Telco customers, including demographics, service subscriptions, and customer churn status. The dataset consists of various features, encompassing both numerical and categorical variables.
The primary objectives of this project are:
-
Churn Prediction: Develop accurate machine learning models that can predict which customers are likely to churn. By identifying at-risk customers early, companies can take proactive measures to retain them.
-
Data Analysis: Conduct in-depth exploratory data analysis (EDA) to gain insights into customer behavior, demographics, and the factors influencing churn. EDA helps in understanding the dataset and uncovering actionable patterns.
-
Model Building: Utilize various machine learning algorithms, including Logistic Regression, Decision Trees, and LightGBM, to build predictive models. These models will help classify customers as potential churners or non-churners.
-
Model Evaluation: Evaluate the performance of the models using relevant metrics like accuracy, precision, recall, and F1-score. The evaluation provides an understanding of how well the models perform in practice.
-
Recommendations: Provide actionable recommendations and insights based on the analysis and model outcomes. These recommendations can guide business strategies to reduce churn rates.
To prepare the data for analysis, we undertook the following steps:
- Handling Missing Values: Identified and addressed missing values to ensure data integrity.
- Encoding Categorical Variables: Categorical variables such as 'gender,' 'contract type,' and 'payment method' were encoded into numerical values for model compatibility.
- Scaling Numerical Features: Numerical features were scaled to maintain a consistent range, thereby improving model performance.
During the exploratory data analysis (EDA) phase, we uncovered key insights, including:
- Summary Statistics: Computed basic statistics such as mean, median, and standard deviation for numerical variables.
- Data Visualizations: Created various visualizations such as histograms, box plots, and correlation matrices to better understand data distribution and relationships.
- Churn Analysis: Analyzed churn rates in relation to different customer attributes to identify potential patterns and trends.
We employed several machine learning models for churn prediction, including:
- Logistic Regression: A fundamental classification model used for predicting churn status.
- Decision Tree Classifier: Employed for modeling more complex decision boundaries.
- LightGBM Classifier: A gradient boosting model known for its efficiency and predictive power.
Hyperparameters and model configurations were fine-tuned to achieve the best performance.
To assess model performance, we employed various metrics including:
- Accuracy: Measures the overall accuracy of the models in predicting churn.
- Precision: Evaluates the models' ability to correctly predict positive churn cases.
- Recall: Assesses the models' ability to capture all actual positive churn cases.
- F1-Score: A combination of precision and recall to measure the model's overall performance.
- Confusion Matrices: Visualizations of model predictions versus actual churn status.
In conclusion, the Telco Customer Churn Prediction project successfully explored, preprocessed, and built predictive models to identify customers at risk of churning. Key findings and feature importance were discussed, providing valuable insights for the business to take targeted actions to reduce churn rates. Further enhancements and model improvements can be considered for future work.
This project represents a starting point for addressing customer churn in the telecommunications industry. Future enhancements and areas for improvement include:
- Incorporating more advanced machine learning techniques.
- Gathering additional data sources for richer customer profiles.
- Developing a user-friendly web application for real-time churn prediction.
- Implementing automated alerts for identifying high-risk customers.
By continuously improving and expanding upon the project, businesses can stay proactive in retaining their valuable customers.
To replicate the results and use the code, follow these detailed steps:
-
Clone the repository to your local machine using the following command: gh repo clone Geo-y20/Telco-Customer-Churn-
-
Ensure you have the required Python libraries and dependencies installed by running: pip install pandas numpy seaborn matplotlib scikit-learn lightgbm
-
Execute the provided Python script for data preprocessing, model building, and evaluation.
-
Refer to the documentation within the code for more specific instructions and customization options.
To run the code, you will need the following Python libraries and dependencies:
- pandas
- numpy
- seaborn
- matplotlib
- scikit-learn
- lightgbm
This project is licensed under the MIT License.
Author: [George Youhana]