Data Science for MD/PhD Students

Goals

The goals of this workshop are to inform MD/PhD Students about how informatics-based approaches can inform their work. We will divide the workshop into multiple parts.

Motivating Question

We are curious about how sleep affects cardiovascular disease risk. How can we use open and accessible data as preliminary data for a research grant to help us answer this question? What steps do we need to take?

Dataset

Sleep Heart Health Study data from https://sleepdata.org

Proposed Outline of workshop (4 hrs total, with break)

  1. What data is out there? (20-30 min)
  2. Problem Formulation (15 min)
  3. Specifying predictive models using knowledge for a research question (30 min)
  4. Mapping our model into available public data (15 min)
  5. Break (15 minutes)
  6. Assessing association of identified covariates with outcome (30 min)
  7. Building the model using logistic regression (1.5 hrs)
  8. Communicating our results to others (30 min)

Requirements

There are some restrictions about using the sleep study data. Ideally, we would use RStudio.cloud to simplify installations. However, existing data restrictions prevent us from doing this.

  1. Installation of R/RStudio on personal computer
  2. Installation of project/data using usethis::use_project()
  3. Signed Data Use Agreement with http://sleepdata.org