/git-STAA-577

Slides, code, cheat sheets, and RStudio lab notebooks for "Applied Machine Learning" course for Spring 2019

Primary LanguageHTML

GitHub Repository for STAA 577

Overview

RStudio lab notebooks, full R code, cheat sheets, resources, and ad hoc notes from “Applied Machine Learning” course Spring 2019.


Why use GitHub?

We have decided to place the course materials in a GitHub repository:

  1. to familiarize you with this widly used collaborative coding tool
  2. so that you will have access to them beyond your tenure at CSU when you venture into the official job market. Jenny Bryan and Jim Hester summarize the benefits of GitHub in this fantastic reference here:

If you ever plan to use verion control with GitHub I strongly recommend reading it in detail.


Course Lab Content

  • Intro Labs
    • Lab 00: Basic Exploring
    • Lab 01: Subsetting (data frames)
    • Lab 02: Data Wrangling with dplyr and the tidyverse
    • Lab 03: Skipped to synchronize course and textbook ISLR
  • Lab 04: Classification
    • The S&P Stock Market Data Set
    • Logistic Regression
    • Discriminant Analysis
    • KNN: K-Nearest Neighbors
  • Lab 05: Cross Validation
    • The Auto Data Set
    • Cross Validation (by hand)
    • LOOCV (leave-one-out)
    • K-fold CV
    • The Bootstrap
  • Lab 06: Subset Selection
    • The Hitters Data Set
    • Subset Selection
    • Shrinkage Methods: Ridge Regression
    • Shrinkage Methods: The Lasso
  • Lab 07: Beyond Linearity
    • The Wage Data Set
    • Polynomial Regression
    • Polynomial Logistic Regression
    • Spline Regression
    • General Additive Models
  • Lab 08: Tree-based Methods
    • The Carseats Data Set
    • Classification Trees
    • Regression Trees
    • Bagging
      • Random Forest
    • Boosting
    • Appendices
    • Resources
  • Lab 09: Support Vector Machines
    • Create training data
    • Support Vector Classifier
    • Support Vector Machine
    • ROC curves
  • Lab 10: Unsupervised Learning
    • Principal Component Analysis (PCA)
    • K-means Clustering
    • Heirarchial Clustering

Datasets for STAA 577

  • nyflights13
    • new york city airport flight data from 2013 (must install)
    • install with install.packages("nyflights13", repos="http://cran.rstudio.com")
  • iris
    • classic iris flower data set from Fisher (comes with R installed)
  • mtcars
    • mtcars: USA motor trend cannonical data set (comes with R installed)

Cheatsheets

Previewing HTML on GitHub

  • Fairly useful tool to preview HTML docs without having to clone the repository
  • Right-click the *.html file, copy the link, then go here, paste the GitHub specific HTML link

Sad But True

Stu’s Looping Rules for R

  1. Always use a vectorized solution over iteration when possible, otherwise … go to #2.
  2. Use a functional. Since R is a functional language and for readability, usually of the apply() family, or a loop-wrapper function, unless …
    • modifying in place: if you are modifying or transforming certain subsets (columns) of a data frame.
    • recursive problems: whenever an iteration depends on the previous iteration, a loop is better suited because a functional does not have access to variables outside the present lexical scope.
    • while loops: in problems where it is unknown how many iterations will be performed, while-loops are well suited and preferred over a functional.
  3. If you must use a loop, ensure the following:
    • Initialize new objects: prior to the loop, allocate the necessary space ahead of time. Do NOT “grow” a vector on-the-fly within a loop (this is terribly slow).
    • Optimize operations: do NOT perform operations inside the loop that could be done either up front of applied in a vectorized fashion following the loop. Enter the loop, do the bare minimum, then get out.

Hadley Wickham Links

Jenny Bryan’s Links

Max Kuhn’s Links

Modeling Framework (thx Max Kuhn)

Memory Usage and rsample:

The rsample package is smarter than you might think.

Vignettes

What is the Tidyverse?

Information about the:


Created on 2019-01-27 by Rmarkdown (v1.11) and R version 3.5.2 (2018-12-20).