Sometimes a model may have a very high accuracy due to target variable feature imbalance. The model may perform poorly in the field. Imbalance learning balances the target variable features.
You can install sklearn
and 'imblearn' modules via the terminal by;
pip install sklearn
pip install imblearn
or via the Jupyter notebook cell;
!pip install sklearn
!pip install imblearn
There are different approaches to imbalanced learning;
- Oversampling
- Undersampling
- (Over and Under) Sampling
Oversampling randomly chooses data from the minority and adds to the dataset without replacement. The oversampling techniques used here were;
- Naive Random Oversampling
- SMOTE Oversampling
Undersampling randoms deletes the majority form the dataset to create a balance. The appraoches used in this project include;
- Cluster centroids
THis is a resampling technique that combines both over and under sampling. The SMOTEENN approach was used here.
The balanced dataset was therefore trained on Logistic Regression for the first project.
The second project involved ensemble learning from imblearn module. Multiple models are applied here including resampling, classifiers and regression. The techniques used were;
- Balanced Random Forest Classifier,
- Easy Ensemble Classifier.