
This project aims to reduce the MAE of a real life healthcare dataset using various data processing, feature selection and regression methods.

Training Dataset: 432600 rows, 18 columns
Test Dataset: 103500 rows, 16 columns

Baseline Model: Lasso Regressor(alpha=0.1)
Baseline MAE: 5.4 (validation dataset)

Final Model: SVR Regressor
Final MAE: reduced the MAE to 4.0 on validation dataset and 3.7 on full test dataset

Interesting Fact and Data Transformation: It was observed that for one observation, only four columns were changing, values in other columns were fixed. Hence, data was transformed to create a wide dataframe on observation level, converting every value in continuous variables a columns.

For example:
col1, col2, col3, col4
1, a, 2, b, 5.5
1, a, 2, b, 6.5
1, a, 2, b, 7.5
1, a, 2, b, 8.5

Modified Dataset: 1, a, 2, b, 5.5, 6.5, 7.5, 8.5