Boston House Prices Prediction: A case study
Solo project: by Anurima Dey
Purpose:
I started my journey of understanding the machine learning algorithms using this project. In the confusion of which topic to choose, "clssification!", "regression?", "classification??", "regression!!", I thought why not start with the topic whose starting alphabet has a higher ASCII value, which is "R", i.e. regression....(Kudos to Random logic :'D). Well but the main purpose was to understand a regression problem (i.e when we are predicting a continuous variable) and applying machine learning techniques to obtain the objective i.e. a decent accuracy. The continuous variable here is the MEDV (median value of the owner occupied homes). This BHPP is also a beginners project to deploy a data analyst career, as enlisted by Kaggle.
Objective:
Let be precise:
- Obtaining and cleaning the data.
- Data Exploration through data Visualization, popularly termed as Exploratory Data Analysis (EDA).
- Optimal and necessary Feature Selection and data partitioning (Train and Test)
- Fitting various regression models, and Random Forest model with grid search, and cross validation.
- Have also applied PCA to check if it increses accuracy.
- The error matrices used to check the accuracy was RMSE, MAPE, MAE, MSE.
- Tried to create self help plotting packages available in R_utils repository
Some visulization for the dataset:
The dataset is obtained from Kaggle stored in here
The Description of the dataset:
The Histogram of MEDV
Correlation between various attributes
Procedure:
I have mainly deployed the problem using three different linear regression models with repeated cross validation for model training.
Model 1: