/DSC180A-DS-Methodology

Concepts in Data Science Methodology and Software Development

DSC 180A Capstone Quarter I

Contents

This repository contains information on the overall Capstone sequence and material for the lecture component of the course.

The course materials for each domain of inquiry is maintained by the domain expert. Links to materials for each domain may be found below, otherwise contact the section leader for your domain of choice.

Course Times and Locations

Lecture

Lecture is held on Mondays at two different times, in the same location:

  • Monday 9:00am - 9:50am, CENTR 222 (A00)
  • Monday 10:00am - 10:50am, CENTR 222 (B00)

Discussion

You must attend the discussion corresponding to your chosen domain of inquiry. Attendence is mandatory.

Section Time Location Title
Discussion A01 W 9am-9:50am CENTR 207 Quantitative Measurement of Artistic Style
Discussion A02 W 9am-9:50a WLH 2113 Wikipedia Edit Wars
Discussion A03 W 9am-9:50a SDSC E145 Fair Policing and Predictive Policing
Discussion B01 W 10am-10:50a CENTR 207 Clustering the Human Genome
Discussion B02 W 10am-10:50am, WLH 220 Malware and Graph Embeddings

Lab

Lab hours are for one-on-one help with both domain experts and methodological experts. Unless separately scheduled with domain experts, lab hours are held Fridays 9:00 - 10:50 in the CSE Basement (B250).

Syllabus

The syllabus for the course may be found here.

Course Schedule

Week Topic: Methodology Topic: Domain
1 Introduction Intro to domain problem
2 Anatomy of a DS project Data generating process (context)
3 Handling data Description of data
4 Version control Domain specific techniques I
5 Workflow patterns I Domain specific techniques II
6 Workflow patterns II Discussion of main result
7 Version control and data Standards for evaluation in domain
8 Environment independence Impacts and ethics
9 Advanced data handling Related questions in domain
10 Multilingual workflows Project proposals

Computing Resources

You are welcome to develop your work on your own computer, however DataHub is available for your use as well. These servers at least as large as your laptop and you can use them either as Jupyter Servers, as well as via a command-line interface. As the quarters progress, they may be provisioned for more memory intensive jobs.