
Logistic Regression for Credit Approval. The data is from http://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data

Primary LanguageJupyter Notebook


This project is to build ML model to classify if credit should be approved or not. The data is from UC Irvine Machine Learning Repository (http://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data) The data doesn't provide information of each column because those are personal data which is sensitive. To summarize the process that I did in my jupyter notebook:

    1. prepare data --> a) missing value b) data imputation c) check imbalanced classification problem
    1. explore data to see relationship
    1. features selection
    1. Spliting data into (X_train,y_train), (X_val, y_val), (X_test,y_test)
    1. I tried three models which are SVM, Logistic Regression and K-Nearest Neighbors.
    1. Hyperparameter Tuning
    1. The model that provide the prominent result is Logistic Regression at thresholds 0.72. The reason that I used threshold = 0.72 instead of 0.5 is that I place an importance on approving credit to the right person while a bank is still be able to earn some profit since ,in my opinion, the key success factor in lending business is risk management. Logistic Regression Model gives precision 94%, recall 72% and f1 score 81% by setting approve credit as Positive.

What should be done further is to try other models such as RandomForestClassifier and XGBoost.