- The problem is to build a robust binary classifier to predict the label (1/0)
- Instead of writing a literal meaning of the features such as “Age”, “Sex”, “Height”, etc., the attributesin the data set are simply provided as “feature 1”, “feature 2” etc.
- The task is to build a classifier that passes the rigorous performance metrics of classifications such as “Precision”, “Recall”, “F-Score”, “ROC-AUC curve”, “Accuracy”.
- Each submission will provide the output score for “Precision”, “Recall”, “F-Score”, “ROC-AUC curve”, “Accuracy” of his/her implemented classifier(s).
- The user may apply feature selection as not all the features are necessarily be relevant.
- This problem requires to know the nature of class distribution.
- The performance improvement will be observed when relevant hyper parameters will be tuned.
- The train-test split percentage may improve or degrade the classifier performance. Therefore, split percentage needs to be taken care of.
- Several classifier are there to chose from such as AdaBoost, XGBoost, KNN, SVM, Logistic Regression, Decision tree, Random forest, Naïve Bayes
Logistic Regression, Decision Tree, AdaBoostClassifier, LightGBM, XGBoost, CatBoost, Random Forest, Support Vector Machine, K-Nearest Neighbors, Naive Bayes were used.