/computing_bootcamp_2021

Duke Statistical Science computing bootcamp 2021

Primary LanguageHTML

Duke University - Department of Statistical Science Computing Bootcamp 2021

This repository contains the computing bootcamp materials for incoming Ph.D. and M.S. students to the Department of Statistical Science at Duke University. These materials are adapted from those developed by Shawn Santo, Mine Çetinkaya-Rundel and Colin Rundel.

Getting started

Computing resources

  • Duke computing resources and getting help
    • Duke VPN
    • Duke software
    • Compute cluster
  • DSS computing resources and getting help
    • RStudio Pro servers

Version control and R

Version control

  • Introduce git and GitHub
  • Initiate a project directory, understand the git workflow
  • Discuss the role of version control in reproducibility
  • Discuss version control best practices

Introduction to reproducible research

  • Recognize the problems that reproducible research helps address, featuring a brief discussion of case studies gone wrong and how reproducible research could have possibly helped
  • Identify pain points in getting your analysis to be reproducible
  • The role of documentation, sharing, automation, and organization in making your research more reproducible
  • Introduce some tools to solve these problems, specifically R / RStudio / R Markdown

Organizing your project to facilitate reproducible research

  • Organize projects and folders to enable reproducibility and reusability
  • Understand the structure of data files and the importance of documenting all changes made
  • Create a reproducible project workflow using R / RStudio / R Markdown

R / RStudio and R Markdown

  • Navigate R Markdown and RStudio
  • Analyze data and create graphics with the package tidyverse
  • Discuss workflow

Python

  • Navigate Jupyter notebooks
  • Introduce Python data structures, control flow, functions, and the basics of object oriented programming
  • Discuss popular Python packages including NumPy, SciPy, pandas, matplotlib, seaborn, and scikit-learn
  • Highlight similarities and differences between Python and R

References

See slides for references related to specific topics.