House Prices Prediction and Credit Default Risk Prediction competitions.
https://www.kaggle.com/c/house-prices-advanced-regression-techniques
https://www.kaggle.com/c/home-credit-default-risk
In both, advanced decision tree-based regression and classification models are used.
In House Prices Prediction, performance evaluation is based on RMSLE (Root Mean Squared Logarithmic Error), while in Credit Default Risk Prediction, it is based on AUROC (Area Under Receiver Operating Characteristic).
In House Prices Prediction, I ranked 816/5011, with an error of 0.12549, compared to the best one of 0.00000.
In Credit Default Risk Prediction, I scored 0.73610, compared to the best score of 0.81724. Ranking was unavailable.
My submissions can be accessed from the submissions folder.
The problems are detailed well in the Kaggle links provided above.
After Feature engineering, the following regression models are tested:
Ridge
BaggingRegressor
n_estimators=50
RandomForestRegressor
n_estimators=50
XGBRegressor
max_depth=5
objective='reg:squarederror'
LGBMRegressor
VotingRegressor
estimators=[ridge, bagging, random_forest, xgb, lgbm]
n_jobs=-1
StackingRegressor
estimators=[ridge, bagging, random_forest, xgb, lgbm]
final_estimator=Ridge
n_jobs=-1
Hyperparameters:
train_test_split(test_size=0.2, random_state=0)
kfold = KFold(n_splits=5, shuffle=True, random_state=0)
cross_val_score(cv=kfold)
VotingRegressor
is the best performing, with the best combined Validation R2 score, RMSLE and Cross validation R2 mean score.
After Feature engineering, the following classification models are tested:
XGBClassifier
tree_method='gpu_hist'
gpu_id=0
LGBMClassifier
device='gpu'
RandomForestClassifier
n_estimators=50
StackingClassifier
estimators=[xgb, lgbm, random_forest]
final_estimator=LGBMClassifier
n_jobs=-1
Hyperparameter: train_test_split(test_size=0.2, random_state=42)
GPU is leveraged. Classification requires more computation power.
LGBMClassifier
is the best performing, with the maximum Validation AUROC score.