Kaggle Competitions

House Prices Prediction and Credit Default Risk Prediction competitions.

https://www.kaggle.com/c/house-prices-advanced-regression-techniques

https://www.kaggle.com/c/home-credit-default-risk

In both, advanced decision tree-based regression and classification models are used.

In House Prices Prediction, performance evaluation is based on RMSLE (Root Mean Squared Logarithmic Error), while in Credit Default Risk Prediction, it is based on AUROC (Area Under Receiver Operating Characteristic).

In House Prices Prediction, I ranked 816/5011, with an error of 0.12549, compared to the best one of 0.00000.

In Credit Default Risk Prediction, I scored 0.73610, compared to the best score of 0.81724. Ranking was unavailable.

My submissions can be accessed from the submissions folder.

Problem Description

The problems are detailed well in the Kaggle links provided above.

Solution Approach

House Prices Prediction

After Feature engineering, the following regression models are tested:

Ridge
BaggingRegressor
- n_estimators=50
RandomForestRegressor
- n_estimators=50
XGBRegressor
- max_depth=5
- objective='reg:squarederror'
LGBMRegressor
VotingRegressor
- estimators=[ridge, bagging, random_forest, xgb, lgbm]
- n_jobs=-1
StackingRegressor
- estimators=[ridge, bagging, random_forest, xgb, lgbm]
- final_estimator=Ridge
- n_jobs=-1

Hyperparameters:

train_test_split(test_size=0.2, random_state=0)
kfold = KFold(n_splits=5, shuffle=True, random_state=0)
cross_val_score(cv=kfold)

VotingRegressor is the best performing, with the best combined Validation R² score, RMSLE and Cross validation R² mean score.

Credit Default Risk Prediction

After Feature engineering, the following classification models are tested:

XGBClassifier
- tree_method='gpu_hist'
- gpu_id=0
LGBMClassifier
- device='gpu'
RandomForestClassifier
- n_estimators=50
StackingClassifier
- estimators=[xgb, lgbm, random_forest]
- final_estimator=LGBMClassifier
- n_jobs=-1

Hyperparameter: train_test_split(test_size=0.2, random_state=42)

GPU is leveraged. Classification requires more computation power.

LGBMClassifier is the best performing, with the maximum Validation AUROC score.

georgemuriithi/kaggle-competitions

Kaggle Competitions

Problem Description

Solution Approach

House Prices Prediction

Credit Default Risk Prediction