This is Jae Yeon Kim's remix version of the D-Lab’s Introduction to Machine Learning in R workshop designed by Chris Kennedy and Evan Muzzall. This version of the workshop focuses on the tidymodels framework and its applications.
View the associated slides here.
- Background on machine learning
- Classification vs regression
- Performance metrics
- Data preprocessing
- Missing data
- Train/test splits
- Algorithm walkthroughs
- Lasso
- Decision trees
- Random forests
- Gradient boosted machines
- SuperLearner ensembling
- Principal component analysis
- Hierarchical agglomerative clustering
- Challenge questions (TBD)
Please follow the notes in participant-instructions.md.
The seven algorithm R Markdown files (lasso, decision tree, random forest, xgboost, SuperLearner, PCA, and clustering) are designed to function in a standalone manner.
After installing and librarying the packages in 01-overview.Rmd, run all the code in 02-preprocessing.Rmd to preprocess the data. Then, open any one of the seven algorithm R Markdown files and "Run All" code to see the results and visualizations!
We assume that participants have familiarity with:
- Basic R syntax
- Statistical concepts such as mean and standard deviation
Please bring a laptop with the following:
- R version 3.6 or greater
- RStudio integrated development environment (IDE) is highly recommended but not required.
Browse resources listed on the D-Lab Machine Learning Working Group repository. Scroll down to see code examples in R and Python, books, courses at UC Berkeley, online classes, and other resources and groups to help you along your machine learning journey!
The slides were made using xaringan, which is a wrapper for remark.js. Check out Chapter 7 if you are interested in making your own! The theme borrows from Brad Boehmke's presentation on Decision Trees, Bagging, and Random Forests - with an example implementation in R.