In this lab, you'll practice your knowledge on adding polynomial terms to your regression model!
You will be able to:
- Use sklearn's built in capabilities to create polynomial features
Here is the dataset you will be working with in this lab:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('sample_data.csv')
df.head()
Run the following line of code. You will notice that the data is clearly of non-linear shape. Begin to think about what degree polynomial you believe will fit it best.
plt.scatter(df['x'], df['y'], color='green', s=50, marker='.');
The next step is to split the data into training and test sets. Set the random_state
to 42 and assign 75% of the data in the training set.
# Split data into 75-25 train-test split
from sklearn.model_selection import train_test_split
y = df['y']
X = df.drop(columns='y', axis=1)
X_train, X_test, y_train, y_test = None
Now it's time to determine the optimal degree of polynomial features for a model that is fit to this data. For each of second, third and fourth degrees:
- Instantiate
PolynomialFeatures()
with the number of degrees - Fit and transform the
X_train
features - Instantiate and fit a linear regression model on the training data
- Transform the test data into polynomial features
- Use the model you built above to make predictions using the transformed test data
- Evaluate model performance on the test data using
r2_score()
- In order to plot how well the model performs on the full dataset, transform
X
usingpoly
- Use the same model (
reg_poly
) to make predictions usingX_poly
# Import relevant modules and functions
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
colors = ['yellow', 'lightgreen', 'blue']
plt.figure(figsize=(10, 6))
plt.scatter(df['x'], df['y'], color='green', s=50, marker='.', label='plot points')
# We'll fit 3 different polynomial regression models from degree 2 to degree 4
for index, degree in enumerate([2, 3, 4]):
# Instantiate PolynomialFeatures
poly = None
# Fit and transform X_train
X_poly_train = None
# Instantiate and fit a linear regression model to the polynomial transformed train features
reg_poly = None
# Transform the test data into polynomial features
X_poly_test = None
# Get predicted values for transformed polynomial test data
y_pred = None
# Evaluate model performance on test data
print("degree %d" % degree, r2_score(y_test, y_pred))
# Transform the full data
X_poly = None
# Now, we want to see what the model predicts for the entire data
y_poly = None
# Create plot of predicted values
plt.plot(X, y_poly, color = colors[index], linewidth=2, label='degree %d' % degree)
plt.legend(loc='lower left')
Great job! You now know how to include polynomials in your linear models.