We learnt about the New York house prices dataset in the in-class session. We learnt
- What is linear regression
- How gradient descent algorithm works to minimize the cost function
- What are the assumptions of linear regression
We also implemented linear regression using sklearn
library in which we
- Fit a linear regression model
- Predicted house prices using the fitted model
- Calculated MSE for predicted and actual house prices
Now, let's take this forward and increase our understanding of linear regression!
This assignment is a series of simple tasks, in which we will be fitting a linear regression model on the house pricing data and validating some of the assumptions of linear regression.
By the end of this assignment,
- You will be able to confidently train a linear regression model and predict values of the target variables
- You will have a better understanding about the assumptions of linear regression and how to validate them. By completing this project you have an opportunity to win 800 points!!
Assumptions of linear regression model play an extremely important role in the model performance and stability. Hence, it is very important to validate these assumptions. Validating these assumptions can give us deeper insights into the kind of data we are dealing with and steps that could be taken to improve the results of the linear model.
So, let's get started.
Here are some of the imports that we will be using throughout the assignment.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pprint import pprint
%matplotlib inline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score