This repository contains the computing bootcamp materials for incoming Ph.D. and M.S. students to the Department of Statistical Science at Duke University. These materials are adapted from those developed by Shawn Santo, Mine Çetinkaya-Rundel and Colin Rundel.
- Duke computing resources and getting help
- Duke VPN
- Duke software
- Compute cluster
- DSS computing resources and getting help
- RStudio Pro servers
- Introduce git and GitHub
- Initiate a project directory, understand the git workflow
- Discuss the role of version control in reproducibility
- Discuss version control best practices
- Recognize the problems that reproducible research helps address, featuring a brief discussion of case studies gone wrong and how reproducible research could have possibly helped
- Identify pain points in getting your analysis to be reproducible
- The role of documentation, sharing, automation, and organization in making your research more reproducible
- Introduce some tools to solve these problems, specifically R / RStudio / R Markdown
- Organize projects and folders to enable reproducibility and reusability
- Understand the structure of data files and the importance of documenting all changes made
- Create a reproducible project workflow using R / RStudio / R Markdown
- Navigate R Markdown and RStudio
- Analyze data and create graphics with the package
tidyverse
- Discuss workflow
- Navigate Jupyter notebooks
- Introduce Python data structures, control flow, functions, and the basics of object oriented programming
- Discuss popular Python packages including
NumPy
,SciPy
,pandas
,matplotlib
,seaborn
, andscikit-learn
- Highlight similarities and differences between Python and R
See slides for references related to specific topics.