Lab | Revisiting Machine Learning Case Study

In this lab, you will use learningSet.csv file which you already have cloned in today's activities.

Complete the following steps on the categorical columns in the dataset:

Check for null values in all the columns
Exclude the following variables by looking at the definitions. Create a new empty list called drop_list. We will append this list and then drop all the columns in this list later:
- OSOURCE - symbol definitions not provided, too many categories
- ZIP CODE - we are including state already
Identify columns that over 85% missing values
Remove those columns from the dataframe
Reduce the number of categories in the column GENDER. The column should only have either "M" for males, "F" for females, and "other" for all the rest
- Note that there are a few null values in the column. We will first replace those null values using the code below:
```
print(categorical['GENDER'].value_counts())
categorical['GENDER'] = categorical['GENDER'].fillna('F')
```

prisciorne/lab-revisiting-machine-learning