Bank GoodCredit wants to predict cred score for current credit card customers. The cred score will denote a customer’s credit worthiness and help the bank in reducing credit default risk.
- A credit card is a financial instrument, which can be used more than once to borrow money or buy products and services on credit.
- Banks, retail stores and other businesses generally issue these.
Credit limit:-
- The maximum amount of charges a card holder may apply to the account.
Annual Fee:-
- A bank charge for use of a credit card levied each year, which ranges depending upon the type of card one possesses.Banks usually take an initial fixed amount in the first year and thena lower amount as yearly renewal fees. Revolving Line Of Credit
TARGET COLUMN == Bad_label
- 0 Represent--Customer has Good credit history
- 1 Represent--Customer has Bad credit history
Build a model with the data provided
- Data exploration insights – what did you find and what decision did you take?
- Feature matrix - List of features selected with gain
- Model evaluation - Gini and rank ordering
- Featching data from data-base.
- Domain analysis.
- EDA: [Univariate, Bivariate & Multivariate analysis condition]
- Data preprocessing/Feature Engineering.
- Feature selection.
- Model creation.
- Model Evaluation.
- Model Saving.
- This table contains customer’s historical accounts data and payments history.
- This table contains customer’s historical enquiry data such as enquiry amount and enquiry purpose.
-
Current customer applications with demographic data.
-
Note that demographics features are renamed as features and obscured in accordance with privacy policies.
- In this data all types of datatype is available.
- In this data some feature are unique so we can not perform any analysis on this feature.
- In this data some feature contain blank spaces so we need to replace with NAN values.
- Not impute the outlier we are scaling to robust the outlier.
- Use Labal encoding technique to handle categorical feature.
- In this data total 23896 observation with 92 feature.
- Some of unique feature in data
- One constant column in data.
TARGET COLUMN ==Bad_label :-
- 0 Represent--> Customer has Good credit history.
- 1 Represent--> Customer has Bad credit history.
- In this plot we are clearly seen the 90% Customer has Good credit and 10% customer are did not do has Good credit.
- In this data we Not do any domain analysis because data is contain private information of customer details of Bank.
- In this data set blank spaces are available so I converted to NaN values.
- In this data set most of the feature contain missing value with including unique feature. Numerical feature missing value impute with Median/Mean. And Categorical feature missing value impute with mode(use fillna function to impute categorical value).
- Some Features has more than 15% to 90% missing value and some unique feature also contain missing value so we drop this feature.
- Handle categorical data using a Labal Encoding.
- Not impute the outlier we are scaling to robust the outlier.
- Robust scaler are robust to outlier. it is used to sacled the feature to median and quantile scaling.
- Drop all unique columns and Constant feature.
- Changing the data type.
- Get correlation and plot heat map.
- Their is no Duplicates present in data.
- Save all pre-process data
-
Logisticregression classifier model is well work on training data as well as testing data And The score of training and testing data is 74 %. But Testing accuracy is still lagging so i apply bagging. After apply bagging score is slightly improve(After bagging score = 77.76%).
-
KNeighboursClassifier model is slightly perform well accuracy on training data as well as testing data. The score of training and testing data with respect to 87.73% & 78.03% .Testing accuracy is still lagging so i apply bagging. but Score Not improve after bagging (Aftre bagging score = 76.86%.
-
Decision Tree Classifeir model is very well work on training data and The score of training data is 100% .Testing data is also good and testing score is 93%.
-
Random Forest Classifier model is also very well work on training data and The score of training data is 100% .Testing data is also good and testing score is 96.51%.
-
Geadient Boosting Classifier model is good work on training and teasting data. it's score is also same like training score is 92.44% and testing score is 92.41%.
-
XGB Classifier model is perform well accuracy on training data as well as testing data. The score of training and testing data with respect to 96.34% & 95.42% .
-
ANN model not perform well on training as well as testing data.
- From above all model Im select XGB classifier beacuse this model perform well on training as well as testing side and low variance and low bias model.
- Save the model using pickle