This repository contains a machine learning project focused on predicting customer churn in the retail sector. The project has been implemented in Python, utilizing libraries like pandas, numpy, and sklearn for data processing, analysis, and predictive modelling.
The project consists of several key stages:
- Data Cleaning: Dealing with missing values and inconsistencies in the dataset.
- Exploratory Data Analysis (EDA): Analyzing the data to understand the relationships between different variables and churn.
- Feature Engineering: Creating new features to enhance model performance.
- Model Building and Evaluation: Building and evaluating a range of models, with the final model chosen being a Random Forest Classifier.
The Random Forest model showed the best performance, with an accuracy of 98%, precision of 99%, recall of 90%, F1 score of 94%, and ROC score of 95%. This model shows strong potential in predicting churn and guiding effective customer retention strategies.
To run the notebook, ensure you have the necessary Python libraries installed (as mentioned above). Clone this repository, open the notebook and run the cells in order.
Feel free to fork this project and enhance the code or models used. Pull requests are welcomed.