This repository is for building a model for the Kaggle challenge: Allstate Claims Severity. A regression problem for predicting insurance severity.
Overall I achieved MAE of 1136.009 using a stacking method of xgboost and multi-layer perceptron.
The repository is organized as follows:
- Data Exploratory - explore_allstate.py
- Random Forest Baseline Model - rf_allstate.py
- XGBoost - xgb_allstate.py
- Neural Network Version 1 - mlp_allstate.py
- Neural Network with Hyperopt - mlp_hyperopt.py
- Stacked Model - stacking_allstate.py
The models & ensemble folders consist of the output from xgboost and MLP models, which are then stacked by the linear model in the stacking.allstate.py.
###################################################### Folder Structure: data - train.csv - test.csv
explore - explore_allstate.py: data exploratory, generate graphs and explore data
predict - rf_allstate.py: generate prediction file with random forest - xgb_allstate.py: ...... with xgboost - mlp_allstate.py: ...... with MLP - stacking_allstate: stack all predictors together utilities - data_preprocess.py: preprocess the data, fill in missing values, etc... model_output - xgb_pred_fold_0.txt # predicted weights from xgb with CV set -> train set for stacking - xgb_pred_fold_1.txt - xgb_pred_fold_2.txt - xgb_pred_test.txt # predicted wegihts from xgb with whole train set -> test set for stacking