Data-rich projects can quickly grow out-of-hand and become irreproducible in the absence of deliberate effort at organization, tool choice, and process. This course will teach basic principles of sound data scientific workflows and will develop skills implementing them in appropriate state-of-the-art systems and languages (e.g., Python and R).
Course webpage: https://github.ubc.ca/MDS-2019-20/DSCI_522_dsci-workflows_students
Slack Channel: https://ubc-mds.slack.com/messages/522_dsci-workflows
By the end of the course, students are expected to be able to:
- Map a data analysis question to appropriate analysis.
- Write R, Python and shell scripts for non-interactive data analysis.
- Run literate coding documents (Jupyter notebooks and R Markdown documents) non-interactively.
- Use a Git/GitHub forking-pull request collaboration approach to collaboratively work on a data analysis project.
- Automate data science workflows (using e.g., Make).
- Manage project software and environment dependencies (using e.g., Docker)
Position | Name | Slack Handle | GHE Handle | Office Hours |
---|---|---|---|---|
Lecture Instructor | Tiffany Timbers | @tiffany |
@timberst |
Thursday at 12:45 - 13:45 - location posted in the calendar |
Lab Instructor | Firas Moosvi | @Firas |
@Firasm |
NA |
Teaching Assistant | Javier Castillo-Arnemann | @Javier |
NA | Posted on Calendar |
Teaching Assistant | Ozum Kafaee | @Ozum |
NA | Posted on Calendar |
Teaching Assistant | Gary Zhu | @Gary |
NA | Posted on Calendar |
Teaching Assistant | Kate Sedivy-Haley | @Kate |
NA | Posted on Calendar |
note - Attendance at office hours is optional
This is a project-based course. You will work in randomly assigned groups of three (or four, if needed). You'll be evaluated as follows:
Assessment | Weight | Deadline | Location |
---|---|---|---|
Milestone 1 - Proposal and data download scipt | 10% | 2020-01-18 @ 18:00 | Submit to Github |
Milestone 2 - Working analysis scripts and report draft | 20% | 2020-01-25 @ 18:00 | Submit to Github |
Milestone 3 - Data analysis pipeline with Make | 20% | 2020-02-01 @ 18:00 | Submit to Github |
Milestone 4 - Final project submission (with ultimate reproducibility) | 30% | 2020-02-08 @ 18:00 | Submit to Github |
Team work | 20% | 2020-02-11 @ 18:00 | Submit to Github |
Lab Details:
Lab | Topic |
---|---|
1 | Teamwork activity, Tagged releases, Semantic versioning |
2 | TBD |
3 | TBD |
4 | TBD |
Lecture | Topic | Required Readings | Additional Readings |
---|---|---|---|
1 | Introduction to Data Science Workflows | ||
2 | Scaling up: read-eval-print-loop (REPL) processes versus non-interactive scripts | ||
3 | Scaling up cont'd: using literate coding documents (Jupyter notebooks and R Markdown documents) non-interactively. | ||
4 | Data Analysis pipelines and shell scripting | ||
5 | Automated workflows; introduction to the build/automation tool Make | ||
6 | Environment management: containerization with Docker part I | ||
7 | Environment management: containerization with Docker part II | ||
8 | Environment management: containerization with Docker part III & Reproducibility wrap-up |
- Art of Data Science by Roger Peng & Elizabeth Matsui (very cheap or even free!)
- Note there are two packages, you only need to get the textbook ("The Book" package), you do not need to get the lecture videos!
- Not so standard deviations podcast with co-hosts: Roger Peng of the Johns Hopkins Bloomberg School of Public Health and Hilary Parker of Stitch Fix.
- Simply Statistics: A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek
Please see the general MDS policies.
UBC provides resources to support student learning and to maintain healthy lifestyles but recognizes that sometimes crises arise and so there are additional resources to access including those for survivors of sexual violence. UBC values respect for the person and ideas of all members of the academic community. Harassment and discrimination are not tolerated nor is suppression of academic freedom. UBC provides appropriate accommodation for students with disabilities and for religious and cultural observances. UBC values academic honesty and students are expected to acknowledge the ideas generated by others and to uphold the highest academic standards in all of their actions. Details of the policies and how to access support are available here.