Dictionaries

Another built in python data structure is the dictionary. Dictionaries creating mappings between key value pairs like so:
dict1 = {keyX : valueX, keyC : valueC, keyQ : valueQ}.

you could also create the same dictionary like this:
dict1 = dict(keyX : valueX, keyC : valueC, keyQ : valueQ).

Unlike strings or lists, dictionaries do not have an index or a specific order. There are ways to iterate through a dictionary, but the order that the items are returned is not gauranteed to be in a specific order. Instead, dictionaries are good for finding the value of a specific item rather then an item in a specific place.

Let's look at a couple of examples in practice:

Importing Packages and Data

As usual, let's start by import pandas and some data that we would be working with.

import pandas as pd
df = pd.read_csv('lego_sets.csv')
df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
ages list_price num_reviews piece_count play_star_rating prod_desc prod_id prod_long_desc review_difficulty set_name star_rating theme_name val_star_rating country
0 6-12 29.99 2.0 277.0 4.0 Catapult into action and take back the eggs fr... 75823.0 Use the staircase catapult to launch Red into ... Average Bird Island Egg Heist 4.5 Angry Birds™ 4.0 US
1 6-12 19.99 2.0 168.0 4.0 Launch a flying attack and rescue the eggs fro... 75822.0 Pilot Pig has taken off from Bird Island with ... Easy Piggy Plane Attack 5.0 Angry Birds™ 4.0 US
2 6-12 12.99 11.0 74.0 4.3 Chase the piggy with lightning-fast Chuck and ... 75821.0 Pitch speedy bird Chuck against the Piggy Car.... Easy Piggy Car Escape 4.3 Angry Birds™ 4.1 US
3 12+ 99.99 23.0 1032.0 3.6 Explore the architecture of the United States ... 21030.0 Discover the architectural secrets of the icon... Average United States Capitol Building 4.6 Architecture 4.3 US
4 12+ 79.99 14.0 744.0 3.2 Recreate the Solomon R. Guggenheim Museum® wit... 21035.0 Discover the architectural secrets of Frank Ll... Challenging Solomon R. Guggenheim Museum® 4.6 Architecture 4.1 US

Using Dictionaries to Rename Column Values

A common use case of dictionaries is to rename the values in a column. For example, let's say that we wanted to rename the review_difficulty naming convention to use a quantitative scale.

# Get previous values
df.review_difficulty.unique()
array(['Average', 'Easy', 'Challenging', 'Very Easy', nan,
       'Very Challenging'], dtype=object)

Notice the nan value above which represents null or blank values. We could potentially translate these difficulty ratings into a quantitative scale like this:

diff_dict = {'Very Easy' : 1, 'Easy' : 2, 'Average' : 3, 'Challenging' : 4, 'Very Challenging' : 5}

We could then create a new column (or update the current column) using that dictionary:

df['Difficulty_Rating'] = df.review_difficulty.map(diff_dict)
df.head() #Preview changes
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
ages list_price num_reviews piece_count play_star_rating prod_desc prod_id prod_long_desc review_difficulty set_name star_rating theme_name val_star_rating country Difficulty_Rating
0 6-12 29.99 2.0 277.0 4.0 Catapult into action and take back the eggs fr... 75823.0 Use the staircase catapult to launch Red into ... Average Bird Island Egg Heist 4.5 Angry Birds™ 4.0 US 3.0
1 6-12 19.99 2.0 168.0 4.0 Launch a flying attack and rescue the eggs fro... 75822.0 Pilot Pig has taken off from Bird Island with ... Easy Piggy Plane Attack 5.0 Angry Birds™ 4.0 US 2.0
2 6-12 12.99 11.0 74.0 4.3 Chase the piggy with lightning-fast Chuck and ... 75821.0 Pitch speedy bird Chuck against the Piggy Car.... Easy Piggy Car Escape 4.3 Angry Birds™ 4.1 US 2.0
3 12+ 99.99 23.0 1032.0 3.6 Explore the architecture of the United States ... 21030.0 Discover the architectural secrets of the icon... Average United States Capitol Building 4.6 Architecture 4.3 US 3.0
4 12+ 79.99 14.0 744.0 3.2 Recreate the Solomon R. Guggenheim Museum® wit... 21035.0 Discover the architectural secrets of Frank Ll... Challenging Solomon R. Guggenheim Museum® 4.6 Architecture 4.1 US 4.0

Creating Dictionaries from DataFrames

You can also quickly create dictionaries from another dataset. Let's say we want the full name of countries listed under the country column.

df.country.value_counts()[:5]
US    817
CA    815
GB    576
NL    576
DN    575
Name: country, dtype: int64
#Pull in a new dataset from online
countries = pd.read_csv('Country_Codes.csv')
countries.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
COUNTRY A2 (ISO) A3 (UN) NUM (UN) DIALING CODE
0 Afghanistan AF AFG 4 93
1 Albania AL ALB 8 355
2 Algeria DZ DZA 12 213
3 American Samoa AS ASM 16 01/01/84
4 Andorra AD AND 20 376
#Create a dictionary
#The zip method is a neat little tool for pairing each entry from two columns together (like a zipper!)
#We then just wrap that in the dict() function.
country_dict = dict(zip(countries['A2 (ISO)'], countries['COUNTRY'])) 
#Map it to our original dataset (you can also do this with pd.merge() for joining multiple fields
df['Country_Full_Name'] = df.country.map(country_dict)
df.head(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
ages list_price num_reviews piece_count play_star_rating prod_desc prod_id prod_long_desc review_difficulty set_name star_rating theme_name val_star_rating country Difficulty_Rating Country_Full_Name
0 6-12 29.99 2.0 277.0 4.0 Catapult into action and take back the eggs fr... 75823.0 Use the staircase catapult to launch Red into ... Average Bird Island Egg Heist 4.5 Angry Birds™ 4.0 US 3.0 United States
1 6-12 19.99 2.0 168.0 4.0 Launch a flying attack and rescue the eggs fro... 75822.0 Pilot Pig has taken off from Bird Island with ... Easy Piggy Plane Attack 5.0 Angry Birds™ 4.0 US 2.0 United States
df.Country_Full_Name.value_counts()[:5]
United States     817
Canada            815
Netherlands       576
United Kingdom    576
Austria           575
Name: Country_Full_Name, dtype: int64

Custom Agg Functions

We can also use dictionaries along with the pandas groupby method to apply different aggregations to different columns:

import numpy as np
agg_dict = {'ages' : 'max',
            'Difficulty_Rating' : [np.mean, np.std],
           'Country_Full_Name' : lambda x: x.value_counts().index[0],
           'num_reviews' : ['mean', 'max']}
df.groupby('theme_name').agg(agg_dict)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead tr th {
    text-align: left;
}

.dataframe thead tr:last-of-type th {
    text-align: right;
}
</style>
ages Difficulty_Rating Country_Full_Name num_reviews
max mean std <lambda> mean max
theme_name
Angry Birds™ 6-12 2.333333 0.516398 United States 5.000000 11.0
Architecture 12+ 3.000000 0.448282 Finland 21.300000 53.0
BOOST 7-12 3.000000 0.000000 Finland 63.000000 63.0
Blue's Helicopter Pursuit 7-12 NaN NaN Finland NaN NaN
BrickHeadz 10+ 1.782673 0.606140 Canada 3.219917 14.0
Carnotaurus Gyrosphere Escape 7-12 3.000000 0.000000 Finland 1.714286 2.0
City 8-12 2.166335 0.755981 Canada 12.892176 89.0
Classic 4-99 1.700680 0.695328 New Zealand 21.054313 180.0
Creator 3-in-1 9-14 2.511811 0.500518 United States 7.842520 27.0
Creator Expert 16+ 3.728707 0.445330 Canada 123.526814 337.0
DC Comics™ Super Heroes 9-14 2.621622 0.486629 Canada 7.945946 47.0
DC Super Hero Girls 9-12 2.833333 0.380693 United States 3.000000 7.0
DIMENSIONS™ 7-14 2.195312 0.501896 Netherlands 3.929688 13.0
DUPLO® 2-5 2.066798 0.717784 Canada 4.117133 22.0
Dilophosaurus Outpost Attack 7-12 2.000000 0.000000 Finland 1.809524 2.0
Disney™ 6-12 2.815385 0.711566 Canada 20.234615 171.0
Elves 9-12 2.849673 0.358565 Canada 3.010256 8.0
Friends 8-12 2.808853 0.636334 Canada 3.904031 18.0
Ghostbusters™ 8-14 3.913043 0.288104 Canada 120.782609 130.0
Ideas 9+ 2.671875 0.743575 Canada 101.757812 367.0
Indoraptor Rampage at Lockwood Estate 8-12 2.000000 0.000000 Finland 2.952381 3.0
Juniors 5-8 2.058932 0.751523 Canada 2.283951 8.0
Jurassic Park Velociraptor Chase 6-12 3.000000 0.000000 Finland 4.000000 4.0
LEGO® Creator 3-in-1 8-12 2.533333 0.516398 Canada 4.666667 15.0
MINDSTORMS® 8+ 2.377246 1.225319 France 6.648936 38.0
Marvel Super Heroes 8-14 2.389785 0.520422 United States 11.919355 54.0
Minecraft™ 8+ 2.754789 0.713352 Canada 5.544061 16.0
Minifigures 5+ 1.000000 0.000000 Canada 14.785714 26.0
NEXO KNIGHTS™ 9-14 3.008621 0.982417 Canada 2.950820 17.0
NINJAGO® 9-14 2.849162 0.902330 Canada 9.513966 61.0
Power Functions 9-16 2.000000 0.000000 Finland 57.000000 57.0
Pteranodon Chase 6-12 2.476190 0.511766 Finland 2.238095 3.0
SERIOUS PLAY® 6+ 2.250000 0.435613 Finland 6.750000 10.0
Speed Champions 8-14 2.338583 0.474162 Canada 8.968504 28.0
Star Wars™ 9-14 2.441755 0.883020 United States 25.908472 201.0
Stygimoloch Breakout 6-12 3.000000 0.000000 Finland 2.000000 2.0
T. rex Transport 7-12 NaN NaN Finland NaN NaN
THE LEGO® BATMAN MOVIE 9-14 2.773469 0.596265 Canada 12.312030 27.0
THE LEGO® NINJAGO® MOVIE™ 9-14 2.739623 0.712520 Australia 17.665409 88.0
Technic 9-16 2.869835 0.809876 Canada 18.768317 143.0

Data Transformation

Create a dictionary that rebins the age column to the following age ranges: Under 5, 5-8, 8-12, 12+

*If there is a conflict in age bin, default to the higher age bin.

# Your code here

Data Visualization

Create a bar graph depicting the number of lego sets for the original age range column. Then create a second bar graph for the new age column you created. How do they compare?

# Your code here