We will be using the files_for_lab/customer_churn.csv
dataset to build a churn predictor.
- Load the dataset and explore the variables.
- We will try to predict variable
Churn
using a logistic regression on variablestenure
,SeniorCitizen
,MonthlyCharges
. - Extract the target variable.
- Extract the independent variables and scale them.
- Build the logistic regression model.
- Evaluate the model.
- Even a simple model will give us more than 70% accuracy. Why?
- Synthetic Minority Oversampling TEchnique (SMOTE) is an over sampling technique based on nearest neighbors that adds new points between existing points. Apply
imblearn.over_sampling.SMOTE
to the dataset. Build and evaluate the logistic regression model. Is it there any improvement?