Lab | Revisiting Machine Learning Case Study

In this lab, you will use learningSet.csv file which you already have cloned in today's activities. The full process for the week is shown in the PDF file.

Complete the following steps on the categorical columns in the dataset:

Check for null values in all the columns
Exclude the following variables by looking at the definitions. Create a new empty list called drop_list. We will append this list and then drop all the columns in this list later:
- OSOURCE - symbol definitions not provided, too many categories
- ZIP - we are including state already
Identify columns that have over 50% missing values.
Remove those columns from the dataframe
Perform all of the cleaning processes from the Lesson.
Reduce the number of categories in the column GENDER. The column should only have either "M" for males, "F" for females, and "other" for all the rest
- Note that there are a few null values in the column. We will first replace those null values using the code below:
```
print(categorical['GENDER'].value_counts())
categorical['GENDER'] = categorical['GENDER'].fillna('F')
```

menneau/lab-revisiting-machine-learning