PSAT 2021 Summer Seminar; binary classification prediction modeling with imbalanced and high dimensional data
This repo is about prediction modeling.
Description
- Binary Classification
- Imbalanced Data ( 1:9 )
- High Dimensional Data (train size : 28000 x 200)
- Masked variables
- Metric : F1 score
Tried
- Oversampling : SMOTE, SMOTEN, ...
- Undersampling : Nearest Neighbors Cleaning Rule, ...
- Dimension Reduction : PCA, KPCA, ...
Best
- Variable Selection with Kolmogorov-Sminorv test
- Gaussian Naive Bayes Classifier
- No over/undersampling