/inconsistency

Ideas for measuring inconsistency in data

Primary LanguageR

inconsistency

R Scripts for measuring inconsistency in MLB ball and strike calls. For more details, see my paper in the Journal of Quantitative Analysis in Sports: New metrics for evaluating home plate umpire consistency and accuracy.

The file pitches2017.Rda contains pitch-tracking data for all pitches during the 2017 MLB season. This data was scraped from http://gd2.mlb.com using the pitchRx package. The scripts in the main directory of this repository will work if this file is read into the variable pitches and made into a tibble.

library(dplyr)
library(tibble)
pitches <- as_tibble(readRDS("pitches2017.Rda"))

Scripts that require this data will have this readRDS line commented out, because it takes a while and it only needs to be executed once per session. Of course, other data can be loaded into the pitches variable, as long as it has the same column names as pitches2017.Rda.

Many of the scripts take a long time to run, so results for the 2017 season are saved as .Rda files. The following table lists the generating scripts.

Data file Generating script
games17inc.Rda measure_games.R
umps17.Rda measure_umps.R
conzonepoly.Rda consensus_zones.R
conczKDE.Rda consensus_zones.R

For other years, you can use the file pitches2015-2018.Rda, which you can make using the script get2015-2018data.R. For example, if you want to run everything on the year 2018, do this:

library(dplyr)
library(tibble)
library(lubridate)
pitches <- as_tibble(readRDS("pitches2015-2018.Rda")) %>%
  filter(year(date) == 2018)