Credit Classifier

This project aims to build a predictive model that will tell if a person is a good or bad payer based on the financial data of that given person.

Project Overview

Developed a tool to predict whether a customer is likely to be a good or bad payer with a minimum accuracy of 75%.
Evaluated and compare the performance of two models, logistic regression and naive Bayes, to determine the best model for the task.
Performed data cleaning and preprocessing to ensure the dataset is suitable for exploratory data analysis (EDA) and model training.
Conducted an EDA to gain insights into the dataset, including analyzing the distribution of various attributes and their relationships with the target variable.
Visualized the distributions of selected attributes using appropriate charts and plots.

Python Version: 3.10.8

packages: pandas, numpy, matplotlib, seaborn, sklearn

The dataset contains 1000 records and you can encounter in this link: financial_data.

It includes the following attributes:

I didn't need to make a lot of changes, but i did the following:

Some findings of the EDA:

70% of the dataset is made by bad payers (700 rows)
most of the clients have existing credits paid back till now
the credit amount attribute is a exponencial distribution having its pick around 2000 DM.

First, i transformed the columns to int type to fit into the models. After, i separeted the dataset in train and test.

I tried two different models:

So, the logistic model is better in terms of accuracy.