/Resampling-Techniques-For-Imbalance-Problems

The Repository is created to cover undersampling and oversampling methods to deal imbalance problem.

Primary LanguageJupyter Notebook

Project Titles

  1. Credit Fraut Detection
  2. Credit Risk Modelling

Description

This projects aim to address the issue of class imbalance in machine learning by implementing and comparing different resampling techniques on imbalanced datasets. The projects include various algorithms for oversampling, undersampling, and combination of both, as well as cost-sensitive learning approaches. The effectiveness of the techniques is evaluated using appropriate performance metrics such as accuracy, precision, recall, and F1 score.

Table of Contents

  • Usage
  • Resampling Techniques
  • Evaluation Metrics
  • Results
  • Conclusion
  • Acknowledgments

Usage

The project includes sample datasets and Jupyter notebooks demonstrating how to implement the different resampling techniques and evaluate their effectiveness.

Resampling Techniques

The projects include the following resampling techniques:

  • Random oversampling
  • SMOTE (Synthetic Minority Over-sampling Technique)
  • ADASYN (Adaptive Synthetic Sampling)
  • Random undersampling
  • Tomek links
  • Edited nearest neighbor
  • CNN(Condensed Nearest Neighbours)
  • Combination of 2 sample methods:
    1. SMOTEENN
    2. SMOTETomek

Evaluation Metrics

The projects use various evaluation metrics to assess the effectiveness of the resampling techniques, including:

  • Accuracy
  • Precision
  • Recall
  • F1 score

Results

The results of the projects were presented in the the Jupyter notebooks but when i push the project to the GitHub,the error is occured:Invalid Notebook because of the Memory Overflow,so I had to clear all output to push the project

Conclusion

The project provides insights into the effectiveness of different resampling techniques for addressing class imbalance in machine learning and highlights the importance of carefully evaluating the performance of these techniques on specific datasets.