BillPetti/baseballr

statcast_search has broken due to batspeed and swing length being added

mlascaleia opened this issue · 16 comments

The tibbles exported from statcast used to have 92 columns, now they have 94!

I foresee this being a continuous error as more and more stats are added. Here is my suggested fix to turn this into something that throws a warning instead of breaking the package:

# (somewhere within the statcast_search function before the payload is searched for)
colos <- c("pitch_type", "game_date", 
            "release_speed", "release_pos_x", "release_pos_z", 
            "player_name", "batter", "pitcher", 
            "events", "description", "spin_dir", 
            "spin_rate_deprecated", "break_angle_deprecated", 
            "break_length_deprecated", "zone", "des", 
            "game_type", "stand", "p_throws", 
            "home_team", "away_team", "type", 
            "hit_location", "bb_type", "balls", 
            "strikes", "game_year", "pfx_x", 
            "pfx_z", "plate_x", "plate_z", 
            "on_3b", "on_2b", "on_1b", "outs_when_up", 
            "inning", "inning_topbot", "hc_x", 
            "hc_y", "tfs_deprecated", "tfs_zulu_deprecated", 
            "fielder_2", "umpire", "sv_id", 
            "vx0", "vy0", "vz0", "ax", 
            "ay", "az", "sz_top", "sz_bot", 
            "hit_distance_sc", "launch_speed", "launch_angle", 
            "effective_speed", "release_spin_rate", 
            "release_extension", "game_pk", "pitcher_1", 
            "fielder_2_1", "fielder_3", "fielder_4", 
            "fielder_5", "fielder_6", "fielder_7", 
            "fielder_8", "fielder_9", "release_pos_y", 
            "estimated_ba_using_speedangle", "estimated_woba_using_speedangle", 
            "woba_value", "woba_denom", "babip_value", 
            "iso_value", "launch_speed_angle", "at_bat_number", 
            "pitch_number", "pitch_name", "home_score", 
            "away_score", "bat_score", "fld_score", 
            "post_away_score", "post_home_score", 
            "post_bat_score", "post_fld_score", "if_fielding_alignment", 
            "of_fielding_alignment", "spin_axis", 
            "delta_home_win_exp", "delta_run_exp")
colNumber <- ncol(payload) 
if(length(colos) != colNumber){
  newCols <- paste("newStat", 1:(length(colos) - colNumber))
  colos <- c(colos, newCols)
  message("New stats detected! baseballr will be updated soon to properly identify these stats")
}
# payload is acquired somewhere in here
# when the payload columns need to be named:
names(payload) <- colos

This way the function will still work when new stats are added, and their names can be updated whenever you update the package

it also fails on this function: scrape_statcast_savant_pitcher is there a work around that can be applied, similar to the above?

Download the dev version with devtools::install_github("BillPetti/baseballr") and this should be fixed!

Thanks for updating! I do want to note with the fix that was implemented the code will still break in the same way if the statcast tibbles are not exactly 94 columns from here on out. Just something worth noting!

Yep! We're going to add a more permanent fix, but wanted to get the hotfix out asap once the switch was made.

Thanks!

thanks for the update!

I reinstalled and am still getting the same column number error. I even did force = TRUE to make sure I got the newest version. Anything else I can try?

I reinstalled and am still getting the same column number error. I even did force = TRUE to make sure I got the newest version. Anything else I can try?

I am having the same issue. Would appreciate any possible help!

Did you install with install.packages("baseballr") or devtools::install_github("BillPetti/baseballr")?

I used devtools::install_github("BillPetti/baseballr"), then to load in the library it is library(baseballr) correct?

Yep! Did you restart your R session between installing and then using the package?

I believe I got it, thank you so much!

Thanks so much for sharing this. Can you please explain how I would work this fix into the following line of code:
Season_Data <- scrape_statcast_savant_batter_all(start_date = "2023-09-27", end_date = "2023-10-01")

When I run the next line,
colNumber <- ncol(payload)
I get the following error:
Error in ncol(payload) : object 'payload' not found

Its happening again : Error in setnames(x, value) :
Can't assign 92 names to a 112 column data.table

and the dev installation method didn't help so far

Thanks so much for sharing this. Can you please explain how I would work this fix into the following line of code: Season_Data <- scrape_statcast_savant_batter_all(start_date = "2023-09-27", end_date = "2023-10-01")

When I run the next line, colNumber <- ncol(payload) I get the following error: Error in ncol(payload) : object 'payload' not found

This wont work because payload only exists within the context of the function. You have to edit the function itself with the code I wrote above, then run the function with the updated code

Alright gang, I made a bad, janky fix for this issue that will act as a stopgap before the package is actually updated. It works by taking unaccounted-for columns and just calling them "newStat". I make no guarantees that it doesn't just ruin other functionalities of the package, but it will get you what you need in the meantime. I'd love to make a cleaner fix but this is a hobby not my job lol.

Run 'devtools::install_github("mlascaleia/baseballr")' in a new R session to install. This will overwrite your current version of baseballr and will not receive any updates that baseballr receives

That works beautifully. Thank you very much for your efforts!!!