For this lab, we will be using the dataset in the Customer Analysis Business Case. This dataset can be found in files_for_lab
folder. In this lab we will explore categorical data.
-
Import the necessary libraries if you are starting a new notebook. Using the same data as the previous lab: we_fn_use_c_marketing_customer_value_analysis.csv
-
Find all of the categorical data. Save it in a categorical_df variable.
-
Check for NaN values.
-
Check all unique values of columns.
-
Check dtypes. Do they all make sense as categorical data?
-
Does any column contain alpha and numeric data? Decide how to clean it.
-
Would you choose to do anything else to clean or wrangle the categorical data? Comment your decisions.
-
Compare policy_type and policy. What information is contained in these columns. Can you identify what is important?
-
Check number of unique values in each column, can they be combined in any way to ease encoding? Comment your thoughts and make those changes.