Slides for the "A first-year undergraduate data science course" talk at useR 2016 at Stanford.
The course titled Better Living Through Data Science: Exploring / Modeling / Predicting / Understanding. Previous course home pages:
- Fall 2015: https://www.stat.duke.edu/courses/Fall15/sta112.01/ (uses git/GitHub)
- GitHub repo for course materials: https://github.com/mine-cetinkaya-rundel/sta112_f15
- Fall 2014: https://www.stat.duke.edu/courses/Fall14/sta112.01/ (does not use git/GitHub)
- Course materials not on GitHub, but available on request
- Course info
- Overview: audience, content, description
- Structure:
- Skills: data wrangling, EDA, visualization, basic inference, modeling, effective communication of results
- Computation: R + RStudio + git/GitHub
- Case studies: movie reviews, sports, airline delays, paris paintings, ...
- Assessment: in class team exercises, individual HW, midterm + final project, take home final exam
- Computation:
- R/RStudio: server
- R Markdown: why and how
- Why:
- Noble goal: Only workflow is a reproducible workflow
- Teaching goal:
- Seeing code and output in one place helps learning
- Syntax highlighting
- Efficiency goal: Easier grading!
- How: Knit early and often
- Why:
- git/GitHub via RStudio: why and how
- Why: Early introducton + marketibility
- How:
- As course management system
- For team collaboration
- Details on org/repo
- Challenges:
- Remember to pull before starting work
- Resolving merge conflicts
- Sometimes RStudio interface isn't sufficient
- First assignment an individual assignment (not team)
- Exercise examples:
- modeling paris paintings
- basketball data scraping + shiny
- Interest and impact:
- data on student interest for course
- curricular changes inspired by course
- gender balance comment