Here is the Market Analysis in Banking Domain
DESCRIPTION
Background and Objective:
Your client, a Portuguese banking institution, ran a marketing campaign to convince potential customers to invest in a bank term deposit scheme. The marketing campaigns were based on phone calls. Often, the same customer was contacted more than once through phone, in order to assess if they would want to subscribe to the bank term deposit or not. You have to perform the marketing analysis of the data generated by this campaign.
Domain: Banking (Market Analysis)
Dataset Description
The data fields are as follows:
- age numeric
- job type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')
- marital marital status (categorical: 'divorced', 'married', 'single', 'unknown'; note: 'divorced' means divorced or widowed)
- education (categorical: 'basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown')
- default has credit in default? (categorical: 'no', 'yes', 'unknown')
- housing: has housing loan? (categorical: 'no', 'yes', 'unknown')
- loan has a personal loan? (categorical: 'no', 'yes', 'unknown')
-
contact contact communication type (categorical: 'cellular', 'telephone')
-
month Month of last contact (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')
-
day_of_week last contact day of the week (categorical: 'mon','tue','wed','thu','fri')
-
duration last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (example, if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call “y” is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.
-
campaign number of times a customer was contacted during the campaign (numeric, includes last contact)
-
pdays: number of days passed after the customer was last contacted from a previous campaign (numeric; 999 means customer was not previously contacted)
-
previous number of times the customer was contacted prior to (or before) this campaign (numeric)
-
poutcome outcome of the previous marketing campaign (categorical: 'failure', 'nonexistent', 'success')
#Output variable (desired target): 16 y has the customer subscribed a term deposit? (binary: 'yes', 'no')
Analysis tasks to be done-:
The data size is huge and the marketing team has asked you to perform the below analysis-
Load data and create a Spark data frame
Give marketing success rate (No. of people subscribed / total no. of entries)
Give marketing failure rate
Give the maximum, mean, and minimum age of the average targeted customer
Check the quality of customers by checking average balance, median balance of customers
Check if age matters in marketing subscription for deposit
Check if marital status mattered for a subscription to deposit
Check if age and marital status together mattered for a subscription to deposit scheme
Do feature engineering for the bank and find the right age effect on the campaign.