Dealing with Categorical Variables - Lab

Introduction

In this lab, you'll explore the Ames Housing dataset for categorical variables, and you'll transform your data so you'll be able to use categorical data as predictors!

Objectives

You will be able to:

  • Determine whether variables are categorical or continuous
  • Use one hot encoding to create dummy variables
  • Describe why dummy variables are necessary

Importing the Ames Housing dataset

Let's start by importing the Ames Housing dataset from ames.csv into a pandas dataframe using pandas read_csv()

# Import your data

Now look at the first five rows of ames:

# Inspect the first few rows

Variable Descriptions

Look in data_description.txt for a full description of all variables.

A preview of some of the columns:

LotArea: Size of the lot in square feet

MSZoning: Identifies the general zoning classification of the sale.

   A	 Agriculture
   C	 Commercial
   FV	Floating Village Residential
   I	 Industrial
   RH	Residential High Density
   RL	Residential Low Density
   RP	Residential Low Density Park 
   RM	Residential Medium Density

OverallCond: Rates the overall condition of the house

   10	Very Excellent
   9	 Excellent
   8	 Very Good
   7	 Good
   6	 Above Average	
   5	 Average
   4	 Below Average	
   3	 Fair
   2	 Poor
   1	 Very Poor

KitchenQual: Kitchen quality

   Ex	Excellent
   Gd	Good
   TA	Typical/Average
   Fa	Fair
   Po	Poor

YrSold: Year Sold (YYYY)

SalePrice: Sale price of the house in dollars

Let's inspect all features using .describe() and .info()

# Use .describe()
# Use .info()

Plot Categorical Variables

Now, pick 6 categorical variables and plot them against SalePrice with a bar graph for each variable. All 6 bar graphs should be on the same figure.

import matplotlib.pyplot as plt
%matplotlib inline

# Create bar plots

Create dummy variables

Create dummy variables for the six categorical features you chose remembering to drop the first. Drop the categorical columns that you used, concat the dummy columns to our continuous variables and asign it to a new variable ames_preprocessed

# Create dummy variables for your six categorical features

Summary

In this lab, you practiced your knowledge of categorical variables on the Ames Housing dataset! Specifically, you practiced distinguishing continuous and categorical data. You then created dummy variables using one hot encoding.