/Hitters-Dataset-Analysis

Analysis of the Hitters Dataset

Primary LanguageR

Hitters-Dataset-Analysis

Analysis of the Hitters Dataset, in which baseball players' statistics are used to predict their salary.

  • Determined the most important features in predicting baseball players' salaries using:
    • Linear Regression
    • Best Subsets
    • Step-wise approaches (forward and backward)
    • Lasso
    • Elastic Net
    • Adaptive Lasso
    For best subsets and stepwise forward and backward, I tuned how many features to select based on minimizing the Bayesian Information Criterion value.
  • Fit and visualized regularization paths for:
    • Lasso
    • Elastic Net at ɑ = 0.33, 0.66
    • Adaptive Lasso
    The regularization paths for each model can be found in the RegularizationPaths folder.
  • Determined the average prediction mean squared error (MSE) for:
    • Least Squares
    • Ridge Regression
    • Best Subsets
    • Step-wise approaches (forward and backward)
    • Lasso
    • Elastic Net
    • Adaptive Lasso
    The visualization for the average MSE can be found at AvgPredictionMSE.png.

To run

Simply open the correct file and run to replicate the results as described above.
hittersfeatureselection.R performs feature selection on the Hitters dataset.
mse_analysis.R determines the average prediction MSE for each model on the Hitters dataset.