This code challenge is designed to test your understanding of the Module 1 material. It covers:
- Pandas
- Data Visualization
- Exploring Statistical Data
- Python Data Structures
Read the instructions carefully. You will be asked both to write code and to respond to a few short answer questions.
For the short answer questions please use your own words. The expectation is that you have not copied and pasted from an external source, even if you consult another source to help craft your response. While the short answer questions are not necessarily being assessed on grammatical correctness or sentence structure, you should do your best to communicate yourself clearly.
In this section you will be doing some preprocessing for a dataset for the videogame FIFA19. The dataset contains both data for the game as well as information about the players' real life careers.
# Run this cell without changes
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
The data you'll be working with is in a file called './data/fifa.csv'
. Use your knowledge of pandas to create a new DataFrame, called df
, using the data from this CSV file.
Check the contents of the first few rows of your DataFrame, then show the number of rows and columns in the DataFrame.
# Replace None with appropriate code
df = None
# Code here to check the first few rows of the DataFrame
# Code here to see the number of rows and columns in the DataFrame
Drop rows from the DataFrame for which "Release Clause" is missing. This is part of a soccer player's contract dealing with being bought out by another team. After you have dropped them, see how many rows are remaining in the DataFrame.
# Code here to drop rows from the DataFrame with missing values for 'Release Clause'
# Code here to check how many rows are left in the DataFrame
Now that there are no missing values, we can change the values in the 'Release Clause'
column from Euro to Dollar amounts.
Assume the current exchange rate is 1 Euro = 1.2 Dollars
# Code here to convert the column of euros to dollars
Continuing to use the same FIFA dataset, plot data using whichever plotting library you are most comfortable with.
# Run this cell without changes
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
2.1) Find the top 10 countries with the most players (using the 'Nationality'
column). Create a bar chart showing the number of players from those 10 countries.
Don't forget to add a title and x axis label to your charts.
If you are unable to find the top 10 countries but want the chance to demonstrate your plotting skills use the following dummy data to create a bar chart:
Country Name | Num Players
============ | ===========
Country A | 100
Country B | 60
Country C | 125
Country D | 89
# Code here to get the top 10 countries with the most players
# Code here to plot a bar chart. A recommended figsize is (10, 6)
2.2) Describe the relationship between StandingTackle
and SlidingTackle
, as shown in the scatter plot produced below.
# Run this cell without changes
fig, ax = plt.subplots()
ax.set_title('Standing Tackle vs. Sliding Tackle')
ax.set_xlabel('Standing Tackle')
ax.set_ylabel('Sliding Tackle')
x = df['StandingTackle']
y = df['SlidingTackle']
ax.scatter(x, y)
Please describe in words the relationship between these two features.
# Your written answer here
# Code here to find the mean age and median age
In your own words, how are the mean and median related to each other and what do these values tell us about the distribution of the column 'Age'
?
# Your written answer here
Use the Nationality
column.
# Code here to find the oldest player in Argentina
# Your written answer here
In this final section, we will work with various Python data types and try to accomplish certain tasks using some fundamental data structures in Python, rather than using Pandas DataFrames. Below, we've defined a dictionary with soccer player names as keys for nested dictionaries containing information about each player's age, nationality, and a list of teams they have played for.
# Run this cell without changes
players = {
'L. Messi': {
'age': 31,
'nationality': 'Argentina',
'teams': ['Barcelona']
},
'Cristiano Ronaldo': {
'age': 33,
'nationality': 'Portugal',
'teams': ['Juventus', 'Real Madrid', 'Manchester United']
},
'Neymar Jr': {
'age': 26,
'nationality': 'Brazil',
'teams': ['Santos', 'Barcelona', 'Paris Saint-German']
},
'De Gea': {
'age': 27,
'nationality': 'Spain',
'teams': ['Atletico Madrid', 'Manchester United']
},
'K. De Bruyne': {
'age': 27,
'nationality': 'Belgium',
'teams': ['Chelsea', 'Manchester City']
}
}
4.1) Create a list
of all the keys in the players
dictionary. Store the list of player names in a variable called player_names
to use in the next question.
Use Python's documentation on dictionaries for help if needed.
# Replace None with appropriate code to get the list of all player names
player_names = None
# Run this cell without changes to check your answer
print(player_names)
4.2) Great! Now that we have the names of all players, let's use that information to create a list
of tuples
containing each player's name along with their nationality. Store the list in a variable called player_nationalities
.
# Replace None with appropriate code to generate list of tuples such that
# the first element is a players name and the second is their nationality
# Ex: [('L. Messi', 'Argentina'), ('Christiano Ronaldo', 'Portugal'), ...]
player_nationalities = None
# Run this cell without changes to check your answer
print(player_nationalities)
4.3) Define a function called get_players_on_team()
that returns a list
of the names of all the players who have played on a given team.
Your function should take two arguments:
- a dictionary of player information
- the team name (as a
string
) you are trying to find the players for
Be sure that your function has a return
statement.
# Code here to define your get_players_on_team() function
# Run this cell without changes to check your answer
players_on_manchester_united = get_players_on_team(players, 'Manchester United')
print(players_on_manchester_united)