This repository contains a collection of the relevant coursework I completed to earn the Data Analysis with Python certification from freeCodeCamp.org. You can verify the certification here.
If you want a strong sense of my fluency in Python and how I communicate via comments in the context of code development, each project folder contains a notebook file (identified by the *.ipynb extension) for that purpose. In addition, copies of the files I submitted to earn the certification are also present in each project folder, and are named using the actual project title (for example, Project 1's submission file is mean_var_std.py). The fundamental Python code is the same between the submission files and their respective notebook presentation, aside from a few test cases and/or comments.
As a result of working on this certification, I gained proficiency with the following Python modules:
- Pandas
- Matplotlib
- Seaborn
- Numpy
Note: two variables used below were declared in the global scope: "df", which is the DataFrame holding the time series data, and "month_labels", which is a list of strings representing full month names.
import pandas as pd
import seaborn as sns
import matplotlib as mpl
def draw_bar_plot():
# Copy and modify data for monthly bar plot
# Extract months and years to individual columns with a lambda
# retitle the columns so they'll match the project requirements
df_bar = pd.DataFrame(
data={
'Years': df.index.to_series().apply(lambda t: t.year),
'Month': df.index.to_series().apply(lambda t: t.month),
'Average Page Views': df['value']
}
)
# use groupby to select each unique month/year combination, returning the mean.
# reset the index so we don't confuse ourselves or our plotting backend.
df_bar = df_bar.groupby(['Month', 'Years']).mean().reset_index()
# now that everything is grouped and we have the mean for the pageviews,
# set the values for the "Month" column to actual month names.
df_bar['Month'] = df_bar['Month'].apply(lambda t: month_labels[int(t) - 1])
# Use seaborn to draw the plot.
fig = sns.catplot(
data=df_bar,
x='Years',
y='Average Page Views',
kind='bar',
hue='Month',
legend_out=False
)
# Need to set the return value to "fig.fig" because we used a seaborn figure-level function,
# And the actual matplotlib figure is a property of the returned object.
fig = fig.fig
fig.savefig('bar_plot.png')
return fig
And the output:
This plot shows two linear regression lines on a scatter plot of sea level measurements taken over the course of a century.