Reproducible computation at scale in R with targets
Data science can be slow. A single round of statistical computation can take several minutes, hours, or even days to complete. The targets
R package keeps results up to date and reproducible while minimizing the number of expensive tasks that actually run. targets
learns how your pipeline fits together, skips costly runtime for steps that are already up to date, runs the rest with optional implicit parallel computing, abstracts files as R objects, and shows tangible evidence that the output matches the underlying code and data. In other words, the package saves time while increasing our ability to trust the conclusions of the research. targets
surpasses the most burdensome permanent limitations of its predecessor, drake
, to achieve greater efficiency and provide a safer, smoother, friendlier user experience. This hands-on workshop teaches targets using a realistic case study. Participants begin with the R implementation of a machine learning project, convert the workflow into a targets
-powered pipeline, and efficiently maintain the output as the code and data change.
- Sign up for a free account at https://rstudio.cloud.
- Log into https://rstudio.cloud/project/1699460 to access a free instance of RStudio Server in the cloud.
- Proceed through the R notebooks in the syllabus in order.
- Install R from https://www.r-project.org.
- Install RStudio Desktop from https://rstudio.com/products/rstudio/download/#download.
- Download or clone the code at https://github.com/wlandau/targets-tutorial.
- Open the tutorial as an RStudio project in the RStudio Desktop.
- Run the setup script to install the required R and Python packages.
Post an issue to https://github.com/wlandau/targets-tutorial to ask for help. Be sure to follow the code of conduct.
Topic | Materials |
---|---|
Intro | slides |
Functions | 1-functions.Rmd |
Pipelines | 2-pipelines.Rmd |
Changes | 3-changes.Rmd |
Files | 4-files.Rmd |
Branching | 5-branching.Rmd |
Debugging | 6-debugging.Rmd |
Challenge | 7-challenge.Rmd |
This schedule budgets time for a 4-hour iteration of the workshop (8 AM to noon).
Topic | Format | Breakout rooms | Minutes | Start | End | Materials |
---|---|---|---|---|---|---|
Intro presentation | lecture | no | 20 | 8:00 | 8:20 | slides |
Q&A | discussion | no | 10 | 8:20 | 8:30 | slides |
Functions for the case study | exercises | yes | 15 | 8:30 | 8:45 | 1-functions.Rmd |
Review functions | lecture | no | 5 | 8:45 | 8:50 | 1-functions.Rmd |
Break | break | no | 10 | 8:50 | 9:00 | |
Build up a pipeline | exercises | yes | 20 | 9:00 | 9:20 | 2-pipelines.Rmd |
Review building up a pipeline | lecture | no | 5 | 9:20 | 9:25 | 2-pipelines.Rmd |
Iterate on changes | exercises | yes | 20 | 9:25 | 9:45 | 3-changes.Rmd |
Review iterating on changes | lecture | no | 5 | 9:45 | 9:50 | 3-changes.Rmd |
Break | break | no | 10 | 9:50 | 10:00 | |
External files | exercises | yes | 20 | 10:00 | 10:20 | 4-files.Rmd |
Review external files | lecture | no | 5 | 10:20 | 10:25 | 4-files.Rmd |
Dynamic branching | exercises | yes | 20 | 10:25 | 10:45 | 5-branching.Rmd |
Review dynamic branching | lecture | no | 5 | 10:45 | 10:50 | 5-branching.Rmd |
Break | break | no | 10 | 10:50 | 11:00 | |
Interactive debugging | exercises | yes | 20 | 11:00 | 11:20 | 6-debugging.Rmd |
Review interactive debugging | lecture | no | 5 | 11:20 | 11:25 | 6-debugging.Rmd |
Challenge exercise | exercises | yes | 20 | 11:25 | 11:45 | 7-challenge.Rmd |
Review challenge exercise | lecture | no | 5 | 11:45 | 11:50 | 7-challenge.Rmd |
Q&A | discussion | no | 10 | 11:50 | 12:00 |