Disclaimer: The following context is completely fictional, the company, the context, the CEO, the business questions exist only in my imagination.
Insurance All is a company that provides health insurance to its customers and the product team is analyzing the possibility of offering policyholders a new product: auto insurance.
As with health insurance, customers of this new auto insurance plan need to pay an amount annually to Insurance All to obtain an amount insured by the company, aimed at the costs of an eventual accident or damage to the vehicle.
Insurance All conducted a survey of about 380,000 customers about their interest in joining a new auto insurance product last year. All customers expressed interest or not in purchasing auto insurance and these responses were saved in a database along with other customer attributes.
The product team selected 127 thousand new customers who did not respond to the survey to participate in a campaign, in which they will receive the offer of the new auto insurance product. The offer will be made by the sales team through telephone calls.
However, the sales team has the capacity to make 20 thousand calls within the campaign period.
In that context, you were hired as a Data Science consultant to build a model that predicts whether or not the customer would be interested in auto insurance.
With its solution, the sales team hopes to be able to prioritize the people with the greatest interest in the new product and, thus, optimize the campaign by making only contacts with customers most likely to make the purchase.
As a result of your consultancy, you will need to deliver a report containing some analysis and answers to the following questions:
Main insights on the most relevant attributes of customers interested in purchasing auto insurance.
- What percentage of customers interested in purchasing auto insurance will the sales team be able to reach by making 20,000 calls?
- And if the sales team's capacity increases to 40,000 calls, what percentage of customers interested in purchasing auto insurance will the sales team be able to contact?
- How many calls does the sales team need to make to contact 80% of customers interested in purchasing auto insurance?
The Data The data set is available on the AWS platform and each row represents a customer and each column contains some attributes that describe that customer, in addition to her response to the survey, in which she mentioned interest or not in the new insurance product.
Insurance All Company
Insurance All is a company that provides health insurance to its customers.
The product team is analyzing the possibility of offering policyholders a new product: auto insurance and they need to now how is the best customers to offer this new product.
The company will call to 20,000 clients to offer the new insurance, based on a survey made with 380,000 insurance health customers.
- What percentage of customers interested in purchasing auto insurance will the sales team be able to reach by making 20,000 calls?
- And if the sales team's capacity increases to 40,000 calls, what percentage of customers interested in purchasing auto insurance will the sales team be able to contact?
- How many calls does the sales team need to make to contact 80% of customers interested in purchasing auto insurance
We need to develop a machine learning model that ranking the customers based his probability in vehicle insurance aquisition.
This Model was be available in an API, and may used by customer any time he needed.
The API will be hosting on Heroku plataform and is available on this url: https://iacvehicleinsurance.herokuapp.com.
Send a list of clients to get a probability vehicle insurance aquisition.
- Id
- Gender
- Age
- Driving License
- Region Code
- Previously Insured
- Vehicle Age
- Vehicle Damage
- Anual Premium
- Policy sales channel
- Vintage
- Id
- Gender
- Age
- Driving License
- Region Code
- Previously Insured
- Vehicle Age
- Vehicle Damage
- Anual Premium
- Policy sales channel
- Vintage
- prediction
We'll use python and his libraries to colect, visualize, prepare, transform, select, modeling and predict the probability vehicle insurance aquisition.
Statistical Description - Make Insights based on statistical
Exploratory Data Analysis - Understand the Business Model and Features Relevance
Balance Data - Balance data of vehicle insurance interest
Data Preparation - Enconding, Reescaling and Transform data to Models
Modeling - Test 5 Machine Learning Models
Cross validation - Make Cross validatino of 2 models with best performance
Hyper Parameter - Found best parameters to Models
Business Model - Convert ML metrics to business metrics
Almost 98% of customer interesting on vehicle insurance had damage historical
Customers with cars between 1 and 2 years correspond to 74,34% of interesting in vehicles insurance
- XGBooost
- Logistic Regression
- Naive Bayes
- Random Forest
- KNN
- LGBM
The models had a better performance than random method (calling randomic to the customers). The lift curve show us how much better each model is.
In all scenarios, Random Forest and LGBM had the best performance, but the Random Forest Model exported is 1521 times bigger than LGBM model. So, I'll use the LGBM Model in production.
In the beginning of this job, the CEO of Insurance All made some questions about model's performance, now I'll respond all of them.
What percentage of customers interested in purchasing auto insurance will the sales team be able to reach by making 20,000 calls?
Made 20.000 calls using the model to ranking the customers, the sale team will call to 46,98% of all interested, about 3 more times that random method.
And if the sales team's capacity increases to 40,000 calls, what percentage of customers interested in purchasing auto insurance will the sales team be able to contact?
Made 40.000 calls using the model to ranking the customers, the sale team will call to 81,20% of all interested, about 2.57 more times that random method.
How many calls does the sales team need to make to contact 80% of customers interested in purchasing auto insurance?
Should be necessary 39130 calls to call to 80% of interesting
20,000 calls(15.74%)
Suposing that each vehicle Insurance was sell at 2000, the mode bring 298% percent of result totalizing $5,836,494.77.
40,000 calls(31.49%)
Suposing that each vehicle Insurance was sell at 2000, the mode bring 257.9% percent of result totalizing $9,288,989.54
My name is Saulo Ferreira Cunha, IT student since 2004 and I'm a Data scientist in formation.
Email: saulofcunha@outlook.com
Linkedin: https://www.linkedin.com/in/saulo-ferreira-cunha-6a6ba232/