Churn Prediction (Readme In progress)
Our goal in this project is to provide a prediction in order to prevent future clients to leave the company. To achieve this intention we'll use some machine learning algorithms to anticipate the probability of the customer to leave after that well use the Gain curve to drawn our scenarios
The result of our project is this scenario.
The project is located in the following link: https://github.com/denisshiki/churn_prediction/blob/main/churn.ipynb
Business Context:
Topbank is a company that offer bank a diverse type of services to other countries of europe, starting for business loans, insurance, wealth management and so on.
The major product of the company is the bank account, using this service the client can deposit it's salary, make loans, see the balance etc. and this bank account has a limit of 12 months if that limit expires he has to renovate the contract in order to use the company services.
Business Question:
You belong to the data science team and your job is to prevent the customers to leave the company, but before solve the problem you have to ask those tree questions.
-
Motivation: What is the motivation of this problem, so we ask to our superiors about who made this problem and what the context, different from other problems, in this project the motivation is more simple and easy to understand.
-
Root Case: After understanding the motivation of the problem that we'll try to understand this root case, so we talk to other people of the company and discover that the income of the company is decreasing and one of the main case is because the clients are leaving the company.
-
Deploy: Understanding the root case, we'll negotiate about the product that we'll deploy, because it's our first CRISP cycle we'll deploy only a result and in our next cycles we'll deliver a better data product.
CRISP Cicle
Like other projects in this we'll use the CRIP method, for further explanation see the link below:
https://www.ibm.com/docs/en/spss-modeler/SaaS?topic=dm-crisp-help-overview
The goal of this method is to repeat the cycle until the project is finished, each cycle is composed by end to end projects and delivered the data product according to what we promissed, and each cycle has follow this schema below:
-
Business Understanding: This stage we define talk to other teams in order to identify the problem and the solution that we'll deploy.
-
Data mining: In this moment we'll search for the data that will help us in this project, in this exercise we'll use the data provided by the Kaggle platform.
-
Data Cleaning: In this stage we'll clean the data removing the NaN values and we'll verify the type of our data, descriptive statistics and columns renaming.
-
Data Exploration: We'll make and business hipothesis in order to give us scalable insights for the company this will help us too se the data in other perspective and give more ideas about future projects.
-
Data Modelling: This we'll make the transformation, reescaling and outliers verification in some variables, tendo este o intuito de tratar os dados para os algoritmos de "Machine Learning".
-
Machine Learning Algorithms: We'll apply some machine learning algorithms and metrics in order to evaluate wich is the best.
-
Machine Learning Evaluation: After select our model we will evaluat how is our income scenarios, this is an important step bacause it will determined if we'll follow our project or no.
Data:
The data were extrated from the kaggle platform in the link below: https://www.kaggle.com/c/rossmann-store-sales
Contendo os seguintes atributos:
Atributo | Descrição |
---|---|
RowNumber | Number of the column. |
CustomerID | Id of the client |
Surname | Client surname |
CreditScore | The core credit of the client |
Geography | The country that he/she lives |
Gender | The client gender |
Age | The client age |
Tenure | the time o tipos de lojas, sendo estas a, b, c, d |
Assortment | Indica a variedade de produtos por levels, sendo esses: a = básico, b = extra, c = extendido |
CompetitionDistance | distância em metros com a loja competidora próxima |
CompetitionOpenSince[Month/Year] | Indica o o ano e mês aproximado que o competidor mais próximo abrir a loja |
Promo | Indica se a loja abriu uma promoção naquele dia |
Promo2 | Indica se a loja continuou a Promo: 0 = Loja não está participando, 1 = Loja está participando |
Promo2Since[Year/Week] | Descreve o ano e semana que a loja participou da promo2 |
PromoInterval | Descreve os meses consecutivos que a loja participou da promoção. |