/PSAT-2021-Summer-Seminar

PSAT 2021 Summer Seminar; binary classification prediction modeling with imbalanced and high dimensional data

Primary LanguageJupyter Notebook

PSAT-2021-Summer-Seminar

PSAT 2021 Summer Seminar; binary classification prediction modeling with imbalanced and high dimensional data

This repo is about prediction modeling.

Description

  • Binary Classification
  • Imbalanced Data ( 1:9 )
  • High Dimensional Data (train size : 28000 x 200)
  • Masked variables
  • Metric : F1 score

Tried

  • Oversampling : SMOTE, SMOTEN, ...
  • Undersampling : Nearest Neighbors Cleaning Rule, ...
  • Dimension Reduction : PCA, KPCA, ...

Best

  • Variable Selection with Kolmogorov-Sminorv test
  • Gaussian Naive Bayes Classifier
  • No over/undersampling

Language

Record
Kaggle Leaderboard 1st (TEAM_3)