Telco Classification Project: Predicting Churn

Project Description

Classification Modeling project to predict customer churn in a Teclecommunications database from SQL
Project created using the data science pipeline (Acquisition, Preparation, Exploration, Analysis & Statistical Testing, and finally Modeling and Evaluation)
Target: Use the initial phases of the data science pipeline to discover drivers of churn and create a model that can predict whether a customer will churn using those derived parameters

Target Dictionary

Target	Description	Data Type
Churn	Column indicating whether or not a customer will churn originally valued as 'Yes' or 'No'	Object

Default Data Dictionary

Column Name	Description	Data Type
payment_type_id	Numerical identity of the payment_type column (1, 2, 3, 4)	int64
internet_service_type_id	Numerical identity of the internet_service column (1, 2, 3)	int64
contract_type_id	Numerical identity of the contract_type column (1, 2, 3)	int64
customer_id	Alphanumeric code for unique customer identity	object
gender	Categorical variable determining customer gender (male, female)	object
senior_citizen	Categorical variable determining customer's age represented as 0 or 1	int64
partner	Categorical variable determining if a customer has a partner or not	object
dependants	Categorical variable determining if a customer has dependants	object
tenure	Numerical value determining how many months a customer has been with the company from their origin	int64
phone_service	Categorical variable of the phone service for each customer (Yes, No)	object
multiple_lines	Categorical variable of the customer's number of phone line status (Yes: Multiple, No: 1 line, No phone service)	object
paperless_billing	Categorical variable of the customer's payment arrangement for monthly charges (Yes: paperless billing, No: no paperless)	object
monthly_charges	Numerical value determining the customer's monthly charges for service	float64
total_charges	Numerical value determining the customer's total charges since point of origin with the company	converted to float64 from object
contract_type	Categorical variable determining the type of contract the customer is in (Month-to-Month, Two year, One year)	object
internet_service_type	Categorical variable determining the type of internet a customer is using (Fiber optic, DSL, None)	object
payment_type	Categorical variable determining the type of payment arrangement a customer has agreed too (Electronic Check, Mailed Check, Bank transfer, Credit card)	object

Engineered Column Dictionary

Column Name	Description	Data Type
month-to-month	categorical variable with a boolean value of 1 if customer is not in a contract or 0 if they are	int64
fiber	categorical variable with a boolean value of 1 if a customer has fiber internet and 0 if they do not	int64
e_check	categorical variable with a boolean value of 1 if the customer uses electronic checks and 0 if they don't	int64
2_contract	categorical variable with a boolean value of 1 if a customer has a 2 year contract and 0 if not	int64

Ideas and Hypothesis

Question 1: Is there a relationship between customers with paperless billing and whether or not they churned

Null Hypothesis: There is no relationship between paperless billing and whether or not a customer has churned
Alternate Hypothesis: There is a relationship between paperless billing customers and whether or not they have churned

After some chi squared testing we determined there is a significant relationship between paperless billing customers and whether or not they have churned

Question 2: Is there a relationship between if a customer has multiple lines and whether or not they have churned

Null Hypothesis: There is no relationship between customers having multiple lines and whether or not they have churned
Alternate Hypothesis: There is a relationship between customers having multiple lines and whether or not they have churned

After some chi squared testing we can determine there is a significant relationship between customers who have multiple lines and whether or not they have churned.

Models

After lots of feature and parameter tweaking in conclusion the Decision Tree model performed the best and was used on the test dataframe.
- Accuracy was 76.72% compared to the basline of 73.47%
- Precision was a solid 82.5%
- The best recall achieved was 60.8%
With more time I think more feature engineering could be used to better optimize for recall to improve the True Positive rate.

Reccomendations

Use the prediction model to uncover 6 out of every 10 customers who will churn and target them with greater incentives or promotions.
Given the models good precision and decent recall prediction we can use it to save an estimated 60% of the churn customers.

Project Reproduction

You will need your own env.py file with your Codeup database credentials to use the sql_connect function
Read the readme file and download the acquire, prepare, and evaluate.py files. Use the functions to recreate the dataframes and seperate the data ad the final_notebook suggest. Follow the comments to complete statistical testing and finally use the visualizations to create your own parameters and build the models.
Last but not least choose your best models from the training dataframe to run a validate test. Then choose your best validate scored model to run the model on the test dataframe.
Finish with concluding remarks and reccomendations for business use of the model.

ForrestMccrosky/Classification_Project

Telco Classification Project: Predicting Churn

Project Description

Target Dictionary

Default Data Dictionary

Engineered Column Dictionary

Ideas and Hypothesis

Question 1: Is there a relationship between customers with paperless billing and whether or not they churned

After some chi squared testing we determined there is a significant relationship between paperless billing customers and whether or not they have churned

Question 2: Is there a relationship between if a customer has multiple lines and whether or not they have churned

After some chi squared testing we can determine there is a significant relationship between customers who have multiple lines and whether or not they have churned.

Models

Reccomendations

Project Reproduction