Detect if label belongs to class 0 or 1 using machine learning predictive algorithms
Notebook should contain :
- EDA, Feature selection, preprocessing
- Model performance analysis in terms of validation and risks involved
- model predictions for test dataset
- write the dependencies, libraries in seperate python file
- Readme file - approach to solve prob, thought process
Task steps:
- Split the train set into train and validation in 4:1 ratio
- Explain model selection. Apply classification model
- Evaluate model accuracy
Notebook 1 (Predictive model pipeline):
-
Load and split the data
-
Standardize and Pipeline tree-based algorithms to handle data imbalances
-
Obtain algorithms with highest accuracy
-
Tune the algorithm and find best parameters
Notebook 2 (Label detection Predictive Analytics) :
-
Explore the dataset
- Check data relations
- Data correlations
- Missing values
- Outliers
- Different data types
- Fix data distribution skewness, kurtosis
- Fix outliers
- Scale, undersample the data
-
Split the cleaned data into 3 sets ( Test, Train, validation set)
-
Utilise 5 fold validation and compare accuracy / recall / roc-auc scores for train, test and validation sets
-
Take the parameters and model from notebook1. Integrate data with this model.
-
Plot confusion matrix, roc-auc curves and expected-actual prediction