/combat_class_imbalance

investigation of class imbalance in a clinical dataset (Parkison's Disease Telemonitoring Dataset from UCI ML Repository)

Primary LanguageJupyter Notebook

Combating class imbalance

Author
  • Jongoh (Andy) Jeong / December 16, 2019
Abstract
  • Class imbalance is an issue commonly observed in real-world data, particularly more often found in clinical datasets. In this investigation of such a frequent problem in a clinical setting, we compare and discuss the performances of such techniques to address the issue with a few classifiers for a binary classification task. We approach the task by exploiting the Parkinson's dataset from the UCI Machine Learning Repository to generate data of various mixture (imbalance) ratios, performing the classification task with Naive Bayes, Logistic Regression and SVM classifiers, and evaluate their performances relative to each other. We observe that applying a combined resampling algorithm (SMOTEENN) on an RBF-kernel SVM classifier yields the best prediction level.