This repository contains files for the Housing Prices regression project.
Data Source:https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data
Use Case: Predict the sales of houses given the data.
Motivation: This dataset has over 80 columns and presents challenges in the EDA, feature engineering and feature selection stages of the data science life cycle. Since these phases take up more than half of a typical data science project, I wish to get my hands dirty and tackle the numerous techniques required for such a task.
Project Life Cycle:
- Data Exploration:
a. Discrete Features evaluation.
b. Continous Features evaluation(Balanced or imbalanced data?).
c. Categorical Features evaluation.
d. Outlier Evaluation - Feature Engineering
a. Categorical Encoding
b. Skewed data trnasformation
c. Handling Null values.
d. Handling Outliers - Feature Selection
- Model Building (Linear Regression, Random Forest, XGBoost)
- Model Evaluation
- Visualization
- Deployment
This project is in progress. As such, this space will be regularly updated.