Credit_card_Score_Classifier: A Jupyter Notebook repository from Jitesh117

Introduction

Developed a Credit Score Classifier with 79.24% accuracy using various Machine Learning Models such as :
Employed various Classification models such as KNN, Random Forests, XGBoost, LightGBM, SVC, etc.
Did Hyperparameter tuning on the KNN model to achieve even better results
Employed Ensemble Modeling to improve the predictive performance of the individual models.

Feature	Count	Unique	Top	Freq
Customer_ID	100,000	12,500	CUS_0xd40	8
Month	100,000	8	January	12,500
Age	100,000	1,788	38	2,833
Occupation	100,000	16	_______	7,062
Annual_Income	100,000	18,940	36,585.12	16
Num_of_Loan	100,000	434	3	14,386
Type_of_Loan	88,592	6,260	Not Specified	1,408
Num_of_Delayed_Payment	92,998	749	19	5,327
Changed_Credit_Limit	100,000	4,384	_	2,091
Credit_Mix	100,000	4	Standard	36,479
Outstanding_Debt	100,000	13,178	1,360.45	24
Credit_History_Age	90,970	404	15 Years and 11 Months	446
Payment_of_Min_Amount	100,000	3	Yes	52,326
Amount_invested_monthly	95,521	91,049	10,000	4,305
Payment_Behaviour	100,000	7	Low_spent_Small_value_payments	25,513
Monthly_Balance	98,800	98,792	-333333333333333333333333333	9
Credit_Score	100,000	3	Standard	5,317

As can be seen from the above table, there were too many inconsistencies and errors in the dataset which had to be cleaned
To tackle data cleaning, I created custom functions to efficiently clean both the numerical and categorical columns.

The insights gleaned from the pivot tables highlight key findings from this comprehensive analysis.

Before	After

The data set showed unbalance distribution. This may cause a biased estimate.
So we will use SMOTE, an oversampling process that allows synthetic data to be generated.

sm = SMOTE(random_state=2)
smote_train_X, smote_train_Y = sm.fit_resample(X_train, y_train)

After training various models on the Dataset, I came to the following conclusion:

	Precision	Recall	F1-Score	Support
0	0.77	0.86	0.81	5,874
1	0.85	0.75	0.80	10,599
2	0.69	0.82	0.75	3,527
Accuracy			0.79	20,000
Macro Avg	0.77	0.81	0.79	20,000
Weighted Avg	0.80	0.79	0.79	20,000