/dsc-polynomial-regression-lab

Primary LanguageJupyter NotebookOtherNOASSERTION

Polynomial Regression - Lab

Introduction

In this lab, you'll practice your knowledge on adding polynomial terms to your regression model!

Objectives

You will be able to:

  • Use sklearn's built in capabilities to create polynomial features

Dataset

Here is the dataset you will be working with in this lab:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv('sample_data.csv')

df.head()

Run the following line of code. You will notice that the data is clearly of non-linear shape. Begin to think about what degree polynomial you believe will fit it best.

plt.scatter(df['x'], df['y'], color='green', s=50, marker='.');

Train-test split

The next step is to split the data into training and test sets. Set the random_state to 42 and assign 75% of the data in the training set.

# Split data into 75-25 train-test split 
from sklearn.model_selection import train_test_split
y = df['y']
X = df.drop(columns='y', axis=1)
X_train, X_test, y_train, y_test = None

Build polynomial models

Now it's time to determine the optimal degree of polynomial features for a model that is fit to this data. For each of second, third and fourth degrees:

  • Instantiate PolynomialFeatures() with the number of degrees
  • Fit and transform the X_train features
  • Instantiate and fit a linear regression model on the training data
  • Transform the test data into polynomial features
  • Use the model you built above to make predictions using the transformed test data
  • Evaluate model performance on the test data using r2_score()
  • In order to plot how well the model performs on the full dataset, transform X using poly
  • Use the same model (reg_poly) to make predictions using X_poly
# Import relevant modules and functions
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

colors = ['yellow', 'lightgreen', 'blue']
plt.figure(figsize=(10, 6))
plt.scatter(df['x'], df['y'], color='green', s=50, marker='.', label='plot points')

# We'll fit 3 different polynomial regression models from degree 2 to degree 4
for index, degree in enumerate([2, 3, 4]):
    
    # Instantiate PolynomialFeatures
    poly = None
    
    # Fit and transform X_train
    X_poly_train = None
    
    # Instantiate and fit a linear regression model to the polynomial transformed train features
    reg_poly = None
    
    # Transform the test data into polynomial features
    X_poly_test = None
    
    # Get predicted values for transformed polynomial test data  
    y_pred = None
    
    # Evaluate model performance on test data
    print("degree %d" % degree, r2_score(y_test, y_pred))
    
    # Transform the full data
    X_poly = None
    
    # Now, we want to see what the model predicts for the entire data 
    y_poly = None
    
    # Create plot of predicted values
    plt.plot(X, y_poly, color = colors[index], linewidth=2, label='degree %d' % degree)
    plt.legend(loc='lower left')

Summary

Great job! You now know how to include polynomials in your linear models.