/ml_exam_dataset

Used multiple algorithms to build a robust binary classifier to predict the label (1/0).

Primary LanguageJupyter Notebook

Problem Description

  1. The problem is to build a robust binary classifier to predict the label (1/0)
  2. Instead of writing a literal meaning of the features such as “Age”, “Sex”, “Height”, etc., the attributesin the data set are simply provided as “feature 1”, “feature 2” etc.
  3. The task is to build a classifier that passes the rigorous performance metrics of classifications such as “Precision”, “Recall”, “F-Score”, “ROC-AUC curve”, “Accuracy”.
  4. Each submission will provide the output score for “Precision”, “Recall”, “F-Score”, “ROC-AUC curve”, “Accuracy” of his/her implemented classifier(s).

Few hints

  1. The user may apply feature selection as not all the features are necessarily be relevant.
  2. This problem requires to know the nature of class distribution.
  3. The performance improvement will be observed when relevant hyper parameters will be tuned.
  4. The train-test split percentage may improve or degrade the classifier performance. Therefore, split percentage needs to be taken care of.
  5. Several classifier are there to chose from such as AdaBoost, XGBoost, KNN, SVM, Logistic Regression, Decision tree, Random forest, Naïve Bayes

Algorithms used

Logistic Regression, Decision Tree, AdaBoostClassifier, LightGBM, XGBoost, CatBoost, Random Forest, Support Vector Machine, K-Nearest Neighbors, Naive Bayes were used.