In this lab, you'll practice fitting a multiple linear regression model on the Ames Housing dataset!
You will be able to:
- Perform a multiple linear regression using StatsModels
- Visualize individual predictors within a multiple linear regression
- Interpret multiple linear regression coefficients from raw, un-transformed data
The Ames Housing dataset is a newer (2011) replacement for the classic Boston Housing dataset. Each record represents a residential property sale in Ames, Iowa. It contains many different potential predictors and the target variable is SalePrice
.
import pandas as pd
ames = pd.read_csv("ames.csv", index_col=0)
ames
ames.describe()
We will focus specifically on a subset of the overall dataset. These features are:
LotArea: Lot size in square feet
1stFlrSF: First Floor square feet
GrLivArea: Above grade (ground) living area square feet
ames_subset = ames[['LotArea', '1stFlrSF', 'GrLivArea', 'SalePrice']].copy()
ames_subset
For each feature in the subset, create a scatter plot that shows the feature on the x-axis and SalePrice
on the y-axis.
# Your code here - import relevant library, create scatter plots
# Your written answer here - do these seem like good candidates for linear regression?
Set the dependent variable (y
) to be the SalePrice
, then choose one of the features shown in the subset above to be the baseline independent variable (X
).
Build a linear regression using StatsModels, describe the overall model performance, and interpret its coefficients.
# Your code here - define y and baseline X
# Your code here - import StatsModels, fit baseline model, display results
# Your written answer here - interpret model results
For this model, use all of the features in ames_subset
.
# Your code here - define X
# Your code here - fit model and display results
# Your written answer here - interpret model results. Does this model seem better than the previous one?
Using your model from Step 3, visualize each of the features using partial regression plots.
# Your code here - create partial regression plots for each predictor
# Your written answer here - explain what you see, and how this relates
# to what you saw in Step 1. What do you notice?
Re-create this model in scikit-learn, and check if you get the same R-Squared and coefficients.
# Your code here - import linear regression from scikit-learn and create and fit model
# Your code here - compare R-Squared
# Your code here - compare intercept and coefficients
Congratulations! You fitted your first multiple linear regression model on the Ames Housing data using StatsModels.