man-o-war/Binary-Classification-XGB

Assessment solution for Arya-ai Binary classification assessment

Jupyter NotebookMIT

Binary-Classification-XGB

Assessment solution for Arya-ai Binary classification problem

Project File Struture:

Root /fldr

Alpha&Omega.ipynb <---- Main Jupyter notebook (Forgot to change name)
Assignement - Data Scientist (1).docx <---- Assessment problem document
Testing_predicitons.csv <---- Target class outputs of the Test Data
README.md <---- this very file
XGBFTW.sav <---- XGBoost model export done over using pickle
requirements.txt <---- Environment Screenshot
essentials_only_req.txt <---- ipynb specific requirements
Data /fldr
- Training_set.csv <---- Training Dataset
- Test_set.csv <---- Testing Dataset

Data Stats:

Train Dataset Shape -> (3910,58)
Test Dataset Shape -> (691,57)
Dataset is Sparse and High Dimensional
Features are highly skewed

Key Decisions:

Used RandomForest Classifier for feature selection.
Selected top 30 features with respect to their feature importance.
For metric considered Binary CrossEntropy | LogLoss and ROC-AUC score.
Model of choice is Xgboost.

Process Flow - Main.ipynb (Alpha&Omega.ipynb)

EDA
Splitting the data
Feature Selection
Data Scaling - Normalization
Model Training
Prediction Metrics
Processing and Predicting on Test Data
Saving Model for Future Usage
Exporting Y_test Predicted scores
Generating requirements. #Has an important Note. Must Read!

Process Flow - Performance_print.py

Splitting the Data
Feature Selection
Importing Presaved model
Using presaved model to generate scores
Using Prettytable to print output table

Had fun making this!!