My project involves cleaning the current WHR22 followed by an EDA and various visualizations to highlight my findings.
- Create and call at least 3 functions or methods, at least one of which must return a value that is used somewhere else in your code (I used a variety of methods via Pandas and created 2 custom functions to highlight mins and maxs)
- Read data from an external file, such as text, JSON, CSV, etc, and use that data in your application (The data is from an Excel file but cleaned and exported as CSV)
- Visualize data in a graph, chart, or other visual representation of data (I used Seaborn to create a Heatmap)
- Use Plotly or Matplotlib to create charts/graphs of data (I used Plotly for a geographic representation & Matplotlib for a variety of visualizations)
- Source data should not be modified/changed - clean data should be stored separately (Data was converted to a CSV for use and stored in a Gist)
- Use a Jupyter notebook to document your data analysis (I used Google Colab for presentation and documentation)
I strongly reccomend running the file in Colab itself. Simply click this link and interact with the notebook by running the cells (Pick Run all under Runtime menu option).
You can also click the ipynb files in the repo this will open a preview and in the left-hand top corner you will see a button 'Open in Colab' Click the button and this will open the notebook so you can run the cells and see the report in action.
If you have to run it in your environment make sure to have the following libraries installed in your Notebook so the project runs properly.
import pandas as pd
import numpy as np
from urllib.request import urlretrieve
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.gridspec as gs
%matplotlib inline
This report will examine the most recent dataset which has been converted into a CSV and stored as a Gist here.