Credit-Card-Fraud-Detection: A Jupyter Notebook repository from Soumayan-pal01

PROJECT TITLE - Credit Card Fraud Detection

GOAL - The aim of the project is to predict fraudulent credit card transactions using machine learning models.This analysis and prediction is important for a bank as each fraud trancastion is a loss of the bank as well as customer faith. The dataset contains transactions made by credit cardholders.

WHAT HAVE I DONE

Loading datasets
Handling null values
Checking the distribution of positive and nagetive classes
Observing the distribution of classes with 'time' feature
Observe the distribution of classes with 'amount' feature
Splitting the data in training and test set
Feature Scaling the 'Amount' column using Label Encoding
Dealing with Skewness
Using the Power Transformer module of sklearn to change the distribution of the data
Building Logistic Regression, XGBoost, Decision Tree and Random Forest models on imbalanced data using various optimisation and hypertuning concepts.
Since data is imbalanced accuracy will be very high always, therefore using the ROC-AUC score metric to get the actual efficiency of every algorithm and perform a comparative analysis.
Visualizing the imbalanced data
Handling the imbalanced data using different approaches like undersampling, oversampling and SMOTE and use the newly generated data to train the best fir model and perform predictions using it.
Saving the models

MODELS USED

Logistic Regression - A simple classification algorithm that measures the probability of a binary response as the value of response variable based on the mathematical equation relating it with the predictor variables.
XGBoost - eXtreme Gradient Boost alsorithm is based on the Gradient Boosting model which uses the boosting technique of ensemble learning where the underfitted data of the weak learners are passed on to the strong learners to increase the strength and accuracy of the model.
Decision Tree - This algorithm works on the basis of creating tree structures to take decisions
Random Forest - This algorithm works on the concept of emsemble learning.It used bagging technique to train multiple predictors on the same sampled instances to achieve a higher degree of accuracy.
GridSearchCV - This is a hyperparameter optimistion algorithm that increases the model accuracy by tweaking the hyperparameters to their best values

LIBRARIES NEEDED

numpy
pandas
matplotlib
seaborn
scikit-learn
imblearn
xgboost

Data Visualizations

Conclusion

In this project we have performed a detailed analysis of the given dataset and tried different approach to erradicate the imbalance in the data to build a better and more generalised training model which can give accurate predictions.In the end we get a 97% accurate XGBoost model and an 95% accurate Logistic Regression model as the best set of models.

Soumayan-pal01/Credit-Card-Fraud-Detection