Step by step goals for completing the Girls Who Code at UM DCMB Capstone Project.
- At the end of each meeting the latest .ipynb should be uploaded to a Google Drive folder shared with your project group. Keep track of your progress in the notebook via comments, in a Google doc, or in a group message on Slack regarding what you accomplished and what you need to do.
- You can communicate with your group & facilitator on Slack!
- One partner should make a Google Drive folder and share it with group mentor and partner(s)
- Make a new Jupyter Notebook (from an existing notebook, File > New notebook) and move it into the new Drive folder
- The code below can be used to read in one of the datasets already on GitHub:
import pandas as pd
url = "https://raw.githubusercontent.com/GWC-DCMB/CapstoneProject/master/datasets/"
filepath = "AP_exams/ap_exams_MI_2018.csv"
df = pd.read_csv(url + filepath)
df.head()
- Start familiarizing yourself with your data. What are the data types in each column of the data set (e.g. float, string)?
- Refine the question or hypothesis you want to explore in your project
- Make a plan for what steps you need to take to answer the question
- Sketch out potential plots including x and y axes (do this on paper with your group)
- Start cleaning data programatically. Add commands to your .ipynb.
- You should be using pandas, check out documentation
- To help with data frame manipulation in pandas check out this Jupyter Notebook
- What variables do you need? What outliers should you remove? What variable has too much missing data to be reliable?
- Remember our example project Jupyter Notebook
- A list of all the functions/methods/packages you've learned can be found here
- Start visualizing your data using matplotlib
- Make presentation including Background, Methods, Hypothesis, Results
- Template Presentation
- Review the example presentation by Rucheng Diao
- Practice your presentation