/CapstoneProject

Capstone Project datasets & instructions.

Primary LanguageJupyter NotebookOtherNOASSERTION

CapstoneProject

Step by step goals for completing the Girls Who Code at UM DCMB Capstone Project.

  • At the end of each meeting the latest .ipynb should be uploaded to a Google Drive folder shared with your project group. Keep track of your progress in the notebook via comments, in a Google doc, or in a group message on Slack regarding what you accomplished and what you need to do.
  • You can communicate with your group & facilitator on Slack!

Data Analysis Step by Step

Get organized!

  • One partner should make a Google Drive folder and share it with group mentor and partner(s)
  • Make a new Jupyter Notebook (from an existing notebook, File > New notebook) and move it into the new Drive folder

Read in the data

  • The code below can be used to read in one of the datasets already on GitHub:
import pandas as pd
url = "https://raw.githubusercontent.com/GWC-DCMB/CapstoneProject/master/datasets/"
filepath = "AP_exams/ap_exams_MI_2018.csv"
df = pd.read_csv(url + filepath)
df.head()
  • Start familiarizing yourself with your data. What are the data types in each column of the data set (e.g. float, string)?

Hypothesis generation

  • Refine the question or hypothesis you want to explore in your project
  • Make a plan for what steps you need to take to answer the question
  • Sketch out potential plots including x and y axes (do this on paper with your group)

Data cleaning

  • Start cleaning data programatically. Add commands to your .ipynb.
  • You should be using pandas, check out documentation
  • To help with data frame manipulation in pandas check out this Jupyter Notebook
  • What variables do you need? What outliers should you remove? What variable has too much missing data to be reliable?
  • Remember our example project Jupyter Notebook
  • A list of all the functions/methods/packages you've learned can be found here

Data visualization

Science communication!