/telco_classification_project

This repository will hold all of my work for the telco classification project

Primary LanguageJupyter Notebook

TELCO CHURN CLASSIFICATION PROJECT

Hi there,

Welcome to the README file for the TELCO CHURN CLASSIFICATION PROJECT.

In here, you will find expanded information on this project including goals, how I will be working through the pipeline and a data dictionary to help offer more insight to the variables that are being used.


The Goal

Why are we here?

  • Goal 1: Identify drivers of churn within the TELCO customer database.
  • Goal 2: Develop a classification model that will help accurately predict customer churn

Project Planning

In addition to this README, you can see my TRELLO project pipeline by visiting the following link: https://trello.com/b/6GNQ9yBL

Here is a snapshot of my project planning/setup on the morning of 3/6/21

image info


Data Dictionary

  • Please use this data dictionary as a reference for the variables used within in the data set.
Feature Data Type Description
customer_id object unique customer identifier
senior_citizen int64 # of months as a customer
tenure_in_months int64 Whether one is a senior or not
total_charges int64 total charges since day 1
churn object Yes = Churn, No = Not Churned
average_charges float64 total_charges / tenure_in_months
tenure_in_years float64 tenure_in_months / 12
encoded_churn int64 1 = Churn, 0 = Not Churned
no_partner_depend int64 no partner & no dependents
phone_lines int64 1 = has phone lines, 0 = No phone
stream_tv_mov int64 has streaming tv & streaming movie
online_sec_bckup int64 has online security & online backup
female uint8 1 = female, 0 = not female
male uint8 1 = male, 0 = not male
no_partner uint8 1 = no partner, 0 = has partner
has_partner unit8 1 = has partner, 0 = no partner
dependents_no unit8 1 = no dependents, 0 = has dependents
dependents_yes unit8 1 = has dependents, 0 = no dependents
device_proctection_no uint8 1 = no protection, 0 = has protection
device_proctection_no_int uint8 1 = no internet, 0 = has internet
device_proctection_yes uint8 1 = has protection, 0 = no protection
tch_support_no uint8 1 = no tech support, 0 = has tech support
tch_support_no_int uint8 1 = no internet, 0 = has internet
tch_support_yes uint8 1 = has tech support, 0 = no tech support
paperless_billing_no uint8 1 = no paperless billing 0 = has paperless billing
paperless_billing_yes uint8 1 = has paperless billing, 0 = no paperless billing
monthly_contract uint8 1 = on monthly contract, 0 = no monthly contract
one_yr_contract uint8 1 = on 1 yr contract, 0 = not on 1 yr contract
two_yr_contract uint8 1 = on 2 yr contract, 0 = not on 2 yr contract
has_dsl uint8 1 = has dsl, 0 = no dsl
has_fiber_optic uint8 1 = has fiber optic, 0 = no fiber optic
no_internet uint8 1 = no internet, 0 = has internet
pmt_bank transfer uint8 1 = pay w/bank transfer, 0 = no bank transfer
pmt_cc uint8 1 = pays w/credit card, 0 = no credit card
pmt_electronic_check uint8 1 = pays w/elec check, 0 = no elec check
pmt_mailed_check uint8 1 = pays w/mail check, 0 = no mail check

Hypothesis and Questions

- There is a relationship between churn and customers who use fiber optic internet who are single(no dependents), on month to month contracts, have no tech support and pay with electronic check.
  • Is there a relationship between churn having fiber optic internet?
  • Is there a relationship between churn and having no tech support?
  • Is there a relationship between churn and being on a monthly contract?
  • Is there a relationship between churn and having no partners and no dependents?
  • Is there a relationship between churn and paying with an electronic check?

How To Recreate This Project

To recreate this project you will need use the following files:

acquire.py prepare.py explore.py model.py

Step 1. Import all necessary libraries to run functions. These can be found in each corresponding .py file

Step 2. Use acquire.py to help pull data from your SQL database. You will need to have your own env.py file with your login information to be able to cnnect and pull fomr your SQL program.

Step 3. Use the clean_telco(df) function followed by the train_validate_test_split(df, seed=123) to prep the data.

Step 4. Verify that your data has been prepped using df.head()

Step 5.. Enter the explore phase using the different univariate, bivariate, and multivariate functions from the explore.py file. This is also a great time to use different line plots, swarm plots, and bar charts. The associated libraries to make the charts happen are from matplotlib, seaborn, scipy, and sklearn

Step 6. Evaluate and model the data using different classification algorithms.

{ 
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
import graphviz
from graphviz import Graph
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
}

Step 7. After you have found a model that works, you can export a CSV with your predictions and probablities using the get_model() function from model.py.

For a more detailed look, please visit my final notebook for telco for further assistance.


Key Findings, Takeaways, Recommendations

Through this classification project I came away with the following key takeways:

  • There is no one main driver churn and it appears a number of elements go into a customer churning out within 5 months

  • The customers who are leaving are mainly month to month contract holder who have no partner or dependents

    • they can cut the cord much easier when they only have to convince themselves to leave
  • Paying by electronic check is indicator that customer might churn when paired with other features

  • A customer having fiber optic internet appears to be a strong indicator of churn but having fiber optic doesn't make a person want to leave the company, it just means they don't want slow internet so this feature should be paired with others

Recommendations & next steps:

  • A feature I would have liked to create would be a column based on the number of services a customer has. I believe that the more services a person has, the less likely will leave based on the fact that nobody signs up for all the bells and whistle just so they can leave 5 months later and have to go through the whole process again.

  • I believe an age column would be helpful to further determine churn and help marketing target that subset

  • Another column of information that could be helpful is area of residence to see if perhaps there is a concentration of where churn is occuring

  • Customers on month to month contracts need to be incentivized in different ways to stay on. This can be achieved through discounts, money back, giveaways, prompt customer service, and good purchasing experiences too name a few.