This allows us to tell pandas that the categories have a logical order by setting the ordered argument equal True. The order is set using the categories parameter. Whichever order we list the categories in will be the order of the categories going forward.
Why do we use categorical : memory
There are few reasons why storing pandas series with a dtype of categorical is useful : it's a huge memory saver
Since pandas will by default load all the data into our computers memory, reducing memory footprint can be helpful when dealing with large datasets.
Specify dtypes when reading data
If we know the data types of columns before reading in a dataset, it is good practise to specify atleast some of the columns dtypes. This can be done by creating dictionary with column names as keys and data types as values.
supply key val pairs with key being current category and value being the desired category, we can rename categories quickly. Series.cat.rename_categories(new_categories=dict)
make a dict : my_changes = {'unknown mix':'Unknown'}
If we need to collapse categories, the .replace() method is quick and easy, but we will need to convert the column back to categorical.
dogs['breed'] =dogs['breed'].cat.rename_categories(my_changes)
dogs['breed'].value_counts()
# renaming using lambda functionsdogs['sex'] =dogs['sex'].cat.rename_categories(lambdac:c.title())
dogs['sex'].cat.categories
Common replacement issues
First, the new category must not currently be in the list of categories. Second we cannot use this method to collapse categories.