Slides for my HANIC presentation
You can find the longer form writeup on my blog at themockup.blog -- this will be a true statement... in the future. For now the slide deck is my writeup!
To get the 2.2 GB of raw data you'll need to download the data from 2000-2019 via nflfastR
, otherwise my code as included should get you to the end result.
Full Disclosure:
- The Random Forest model takes about 20 min to run w/ 1000 trees.
- You lose about 3% accuracy w/ only 100 trees but cut your run time down to 2 min
- The logistic regression is almost as accurate as the Random Forest model and takes about 2 min as well
-
tidymodels
.org has step by step guides of various complexities -
I'll be adding additional NFL-focused examples at: TheMockup.blog
-
Julia Silge's (a
tidymodels
maintainer) blog or video series- She has 10+ videos/blogposts covering various aspects of the full pipeline
- Most recently covered predicting Beach Volleyball winners w/ a tuned XGBoost model
-
Alison Hill's Workshop from rstudio::conf2020
-
Gentle Intro to
tidymodels
on RStudio RViews blog -
Julia Silge's online free interactive course