This is a personal machine learning project where I analyze an NHL dataset outlining details of roughly 25,000 hockey games played over approximately 20 years. Here, I try to use a few classification approaches in Python and compare their performances. I hope to use this analysis as practice for the following:
- identifying appropriate variables to include as features
- handling duplicated and missing values
- encoding categorical variables with different approaches
- fitting logistic regression models
- fitting random forest classifiers
- fitting gradient boosted decision tree classifiers
- assessing model performance using AUROC
- identifying important features in the model
Can be found on Kaggle.com here: https://www.kaggle.com/jpalmer37/nhl-away-teams