Another built in python data structure is the dictionary. Dictionaries creating mappings between key value pairs like so:
dict1 = {keyX : valueX, keyC : valueC, keyQ : valueQ}
.
you could also create the same dictionary like this:
dict1 = dict(keyX : valueX, keyC : valueC, keyQ : valueQ)
.
Unlike strings or lists, dictionaries do not have an index or a specific order. There are ways to iterate through a dictionary, but the order that the items are returned is not gauranteed to be in a specific order. Instead, dictionaries are good for finding the value of a specific item rather then an item in a specific place.
Let's look at a couple of examples in practice:
As usual, let's start by import pandas and some data that we would be working with.
import pandas as pd
df = pd.read_csv('lego_sets.csv')
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
ages | list_price | num_reviews | piece_count | play_star_rating | prod_desc | prod_id | prod_long_desc | review_difficulty | set_name | star_rating | theme_name | val_star_rating | country | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6-12 | 29.99 | 2.0 | 277.0 | 4.0 | Catapult into action and take back the eggs fr... | 75823.0 | Use the staircase catapult to launch Red into ... | Average | Bird Island Egg Heist | 4.5 | Angry Birds™ | 4.0 | US |
1 | 6-12 | 19.99 | 2.0 | 168.0 | 4.0 | Launch a flying attack and rescue the eggs fro... | 75822.0 | Pilot Pig has taken off from Bird Island with ... | Easy | Piggy Plane Attack | 5.0 | Angry Birds™ | 4.0 | US |
2 | 6-12 | 12.99 | 11.0 | 74.0 | 4.3 | Chase the piggy with lightning-fast Chuck and ... | 75821.0 | Pitch speedy bird Chuck against the Piggy Car.... | Easy | Piggy Car Escape | 4.3 | Angry Birds™ | 4.1 | US |
3 | 12+ | 99.99 | 23.0 | 1032.0 | 3.6 | Explore the architecture of the United States ... | 21030.0 | Discover the architectural secrets of the icon... | Average | United States Capitol Building | 4.6 | Architecture | 4.3 | US |
4 | 12+ | 79.99 | 14.0 | 744.0 | 3.2 | Recreate the Solomon R. Guggenheim Museum® wit... | 21035.0 | Discover the architectural secrets of Frank Ll... | Challenging | Solomon R. Guggenheim Museum® | 4.6 | Architecture | 4.1 | US |
A common use case of dictionaries is to rename the values in a column. For example, let's say that we wanted to rename the review_difficulty naming convention to use a quantitative scale.
# Get previous values
df.review_difficulty.unique()
array(['Average', 'Easy', 'Challenging', 'Very Easy', nan,
'Very Challenging'], dtype=object)
Notice the nan
value above which represents null or blank values. We could potentially translate these difficulty ratings into a quantitative scale like this:
diff_dict = {'Very Easy' : 1, 'Easy' : 2, 'Average' : 3, 'Challenging' : 4, 'Very Challenging' : 5}
We could then create a new column (or update the current column) using that dictionary:
df['Difficulty_Rating'] = df.review_difficulty.map(diff_dict)
df.head() #Preview changes
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
ages | list_price | num_reviews | piece_count | play_star_rating | prod_desc | prod_id | prod_long_desc | review_difficulty | set_name | star_rating | theme_name | val_star_rating | country | Difficulty_Rating | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6-12 | 29.99 | 2.0 | 277.0 | 4.0 | Catapult into action and take back the eggs fr... | 75823.0 | Use the staircase catapult to launch Red into ... | Average | Bird Island Egg Heist | 4.5 | Angry Birds™ | 4.0 | US | 3.0 |
1 | 6-12 | 19.99 | 2.0 | 168.0 | 4.0 | Launch a flying attack and rescue the eggs fro... | 75822.0 | Pilot Pig has taken off from Bird Island with ... | Easy | Piggy Plane Attack | 5.0 | Angry Birds™ | 4.0 | US | 2.0 |
2 | 6-12 | 12.99 | 11.0 | 74.0 | 4.3 | Chase the piggy with lightning-fast Chuck and ... | 75821.0 | Pitch speedy bird Chuck against the Piggy Car.... | Easy | Piggy Car Escape | 4.3 | Angry Birds™ | 4.1 | US | 2.0 |
3 | 12+ | 99.99 | 23.0 | 1032.0 | 3.6 | Explore the architecture of the United States ... | 21030.0 | Discover the architectural secrets of the icon... | Average | United States Capitol Building | 4.6 | Architecture | 4.3 | US | 3.0 |
4 | 12+ | 79.99 | 14.0 | 744.0 | 3.2 | Recreate the Solomon R. Guggenheim Museum® wit... | 21035.0 | Discover the architectural secrets of Frank Ll... | Challenging | Solomon R. Guggenheim Museum® | 4.6 | Architecture | 4.1 | US | 4.0 |
You can also quickly create dictionaries from another dataset. Let's say we want the full name of countries listed under the country column.
df.country.value_counts()[:5]
US 817
CA 815
GB 576
NL 576
DN 575
Name: country, dtype: int64
#Pull in a new dataset from online
countries = pd.read_csv('Country_Codes.csv')
countries.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
COUNTRY | A2 (ISO) | A3 (UN) | NUM (UN) | DIALING CODE | |
---|---|---|---|---|---|
0 | Afghanistan | AF | AFG | 4 | 93 |
1 | Albania | AL | ALB | 8 | 355 |
2 | Algeria | DZ | DZA | 12 | 213 |
3 | American Samoa | AS | ASM | 16 | 01/01/84 |
4 | Andorra | AD | AND | 20 | 376 |
#Create a dictionary
#The zip method is a neat little tool for pairing each entry from two columns together (like a zipper!)
#We then just wrap that in the dict() function.
country_dict = dict(zip(countries['A2 (ISO)'], countries['COUNTRY']))
#Map it to our original dataset (you can also do this with pd.merge() for joining multiple fields
df['Country_Full_Name'] = df.country.map(country_dict)
df.head(2)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
ages | list_price | num_reviews | piece_count | play_star_rating | prod_desc | prod_id | prod_long_desc | review_difficulty | set_name | star_rating | theme_name | val_star_rating | country | Difficulty_Rating | Country_Full_Name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6-12 | 29.99 | 2.0 | 277.0 | 4.0 | Catapult into action and take back the eggs fr... | 75823.0 | Use the staircase catapult to launch Red into ... | Average | Bird Island Egg Heist | 4.5 | Angry Birds™ | 4.0 | US | 3.0 | United States |
1 | 6-12 | 19.99 | 2.0 | 168.0 | 4.0 | Launch a flying attack and rescue the eggs fro... | 75822.0 | Pilot Pig has taken off from Bird Island with ... | Easy | Piggy Plane Attack | 5.0 | Angry Birds™ | 4.0 | US | 2.0 | United States |
df.Country_Full_Name.value_counts()[:5]
United States 817
Canada 815
Netherlands 576
United Kingdom 576
Austria 575
Name: Country_Full_Name, dtype: int64
We can also use dictionaries along with the pandas groupby method to apply different aggregations to different columns:
import numpy as np
agg_dict = {'ages' : 'max',
'Difficulty_Rating' : [np.mean, np.std],
'Country_Full_Name' : lambda x: x.value_counts().index[0],
'num_reviews' : ['mean', 'max']}
df.groupby('theme_name').agg(agg_dict)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
ages | Difficulty_Rating | Country_Full_Name | num_reviews | |||
---|---|---|---|---|---|---|
max | mean | std | <lambda> | mean | max | |
theme_name | ||||||
Angry Birds™ | 6-12 | 2.333333 | 0.516398 | United States | 5.000000 | 11.0 |
Architecture | 12+ | 3.000000 | 0.448282 | Finland | 21.300000 | 53.0 |
BOOST | 7-12 | 3.000000 | 0.000000 | Finland | 63.000000 | 63.0 |
Blue's Helicopter Pursuit | 7-12 | NaN | NaN | Finland | NaN | NaN |
BrickHeadz | 10+ | 1.782673 | 0.606140 | Canada | 3.219917 | 14.0 |
Carnotaurus Gyrosphere Escape | 7-12 | 3.000000 | 0.000000 | Finland | 1.714286 | 2.0 |
City | 8-12 | 2.166335 | 0.755981 | Canada | 12.892176 | 89.0 |
Classic | 4-99 | 1.700680 | 0.695328 | New Zealand | 21.054313 | 180.0 |
Creator 3-in-1 | 9-14 | 2.511811 | 0.500518 | United States | 7.842520 | 27.0 |
Creator Expert | 16+ | 3.728707 | 0.445330 | Canada | 123.526814 | 337.0 |
DC Comics™ Super Heroes | 9-14 | 2.621622 | 0.486629 | Canada | 7.945946 | 47.0 |
DC Super Hero Girls | 9-12 | 2.833333 | 0.380693 | United States | 3.000000 | 7.0 |
DIMENSIONS™ | 7-14 | 2.195312 | 0.501896 | Netherlands | 3.929688 | 13.0 |
DUPLO® | 2-5 | 2.066798 | 0.717784 | Canada | 4.117133 | 22.0 |
Dilophosaurus Outpost Attack | 7-12 | 2.000000 | 0.000000 | Finland | 1.809524 | 2.0 |
Disney™ | 6-12 | 2.815385 | 0.711566 | Canada | 20.234615 | 171.0 |
Elves | 9-12 | 2.849673 | 0.358565 | Canada | 3.010256 | 8.0 |
Friends | 8-12 | 2.808853 | 0.636334 | Canada | 3.904031 | 18.0 |
Ghostbusters™ | 8-14 | 3.913043 | 0.288104 | Canada | 120.782609 | 130.0 |
Ideas | 9+ | 2.671875 | 0.743575 | Canada | 101.757812 | 367.0 |
Indoraptor Rampage at Lockwood Estate | 8-12 | 2.000000 | 0.000000 | Finland | 2.952381 | 3.0 |
Juniors | 5-8 | 2.058932 | 0.751523 | Canada | 2.283951 | 8.0 |
Jurassic Park Velociraptor Chase | 6-12 | 3.000000 | 0.000000 | Finland | 4.000000 | 4.0 |
LEGO® Creator 3-in-1 | 8-12 | 2.533333 | 0.516398 | Canada | 4.666667 | 15.0 |
MINDSTORMS® | 8+ | 2.377246 | 1.225319 | France | 6.648936 | 38.0 |
Marvel Super Heroes | 8-14 | 2.389785 | 0.520422 | United States | 11.919355 | 54.0 |
Minecraft™ | 8+ | 2.754789 | 0.713352 | Canada | 5.544061 | 16.0 |
Minifigures | 5+ | 1.000000 | 0.000000 | Canada | 14.785714 | 26.0 |
NEXO KNIGHTS™ | 9-14 | 3.008621 | 0.982417 | Canada | 2.950820 | 17.0 |
NINJAGO® | 9-14 | 2.849162 | 0.902330 | Canada | 9.513966 | 61.0 |
Power Functions | 9-16 | 2.000000 | 0.000000 | Finland | 57.000000 | 57.0 |
Pteranodon Chase | 6-12 | 2.476190 | 0.511766 | Finland | 2.238095 | 3.0 |
SERIOUS PLAY® | 6+ | 2.250000 | 0.435613 | Finland | 6.750000 | 10.0 |
Speed Champions | 8-14 | 2.338583 | 0.474162 | Canada | 8.968504 | 28.0 |
Star Wars™ | 9-14 | 2.441755 | 0.883020 | United States | 25.908472 | 201.0 |
Stygimoloch Breakout | 6-12 | 3.000000 | 0.000000 | Finland | 2.000000 | 2.0 |
T. rex Transport | 7-12 | NaN | NaN | Finland | NaN | NaN |
THE LEGO® BATMAN MOVIE | 9-14 | 2.773469 | 0.596265 | Canada | 12.312030 | 27.0 |
THE LEGO® NINJAGO® MOVIE™ | 9-14 | 2.739623 | 0.712520 | Australia | 17.665409 | 88.0 |
Technic | 9-16 | 2.869835 | 0.809876 | Canada | 18.768317 | 143.0 |
Create a dictionary that rebins the age column to the following age ranges: Under 5, 5-8, 8-12, 12+
*If there is a conflict in age bin, default to the higher age bin.
# Your code here
Create a bar graph depicting the number of lego sets for the original age range column. Then create a second bar graph for the new age column you created. How do they compare?
# Your code here