/Predicting-Baseball-Statistics

Predicting Baseball Statistics: Classification and Regression Applications in Python Using scikit-learn and TensorFlow-Keras

Primary LanguageJupyter Notebook

Predicting-Baseball-Statistics

Classification and Regression Applications in Python Using scikit-learn

This repository contains the prediction of baseball statistics using MLB Statcast Metrics.

ap_mlb_1_stadium

Goals

  • Using MLB Statcast Metrics, summarize and examine baseball statistics.

Classification

  • Build and train models to predict home runs and extra-base hits implementing the following approaches:

    • Logistic Regression
    • k-Nearest Neighbors
    • Decision-Classification Tree
    • Random Forest Classification
    • Support Vector Machine Classification
    • XGBoost Classification
  • Implement over-sampling for imbalanced data to improve the quality of predictive modeling (i.e., generalizability).

  • Apply regularization and cross-validation techniques for model evaluation, selection, and optimization.

Regression

  • Build and train models to predict hit distance implementing the following approaches:

    • Linear Regression
    • Decision-Regression Tree
    • Random Forest Regression
  • Apply regularization (Ridge, Lasso, Elastic Net) and cross-validation (k-fold) techniques for model evaluation, selection, and optimization.