I am using Tune-sklearn & Optuna with ray here primarily for tuning Scikit-Learn / XGBoost / LightGBMmodels with cutting edge hyperparameter tuning techniques. This was some of the code for my class DAAN 888 project. I only adding the initial EDA, Feature selection & HPO (hyperparameter optimization) notebooks here.
The dataset used was obtained from creditriskanalytics.
The goal was to of accurately predict the likelihood of a borrower’ default of a mortgage loan from two classes (Binary classification):
0: Non-Default 1: Default
After reviewing the obtained results (73.2 % F1-score & 80.4 % Recall), we can further improve our results by:
- Using random Oversampling techniques to improve our accuracy on the default positive class.
- Trying another state of the art model like Roberta and fine tune.
- Extend our params in our HPO operation.
- collecting more data.
The final parameters of the LGBMClassifier best model are:
LGBMClassifier(bagging_fraction=0.4, learning_rate=0.05, max_depth=30, min_split_gain=0.2, n_estimators=2071, num_leaves=49, objective='binary', reg_alpha=5.0, reg_lambda=1.0, scale_pos_weight=2)