nflverse/nflfastR

[FEATURE REQ] Allow calculating career stats using the `calculate_stats()` function

Closed this issue · 1 comments

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

I'd like to be able to calculate players' career stats (since 1999). I used to be able to do this with the calculate_player_stats() function. However, my understanding is that the calculate_player_stats() function has been superseded by the calculate_stats() function, and I am not able to do this directly with the calculate_stats() function.

If I use the seasonal statistics, I can do post-processing to figure out some of the career statistics (i.e., the variables that can be summed across years, e.g., passing touchdowns). However, some variables cannot be meaningfully summed or averaged across years to get the "true" career statistic (e.g., completion percentage, QBR). Again, these can be estimated from seasonal statistics using additional postprocessing, weighted averages based on the number of games played in the season, etc., but doing it via postprocessing would be hack-ish.

Describe the solution you'd like

The capability to calculate players' stats across all available seasons (since 1999) was able to be performed using the calculate_player_stats() function:

nfl_pbp <- nflreadr::load_pbp(seasons = TRUE)

careerStats_offense <- nflfastR::calculate_player_stats(
  nfl_pbp,
  weekly = FALSE)

careerStats_defense <- nflfastR::calculate_player_stats_def(
  nfl_pbp,
  weekly = FALSE)

careerStats_kicking <- nflfastR::calculate_player_stats_kicking(
  nfl_pbp,
  weekly = FALSE)

It would be nice to add this capability to the calculate_stats() function. For instance, it would be helpful to add "career" as an option to the summary_level argument:

calculate_stats(
  seasons = TRUE,
  summary_level = c("season", "week", "career")
  stat_type = c("player", "team"),
  season_type = c("REG", "POST", "REG+POST")
)

Describe alternatives you've considered

No response

Additional context

No response

At the moment calculate_stats() calculates a total of 118 different variables. There is only one (!), namely passing_cpoe, which cannot simply be summed up. However, the code for this is freely accessible here and easy to adapt.

passing_stats_from_pbp <- pbp %>%
dplyr::filter(.data$play_type %in% c("pass", "qb_spike")) %>%
dplyr::select(
"season", "week", "team" = "posteam",
"player_id" = "passer_player_id", "qb_epa", "cpoe"
) %>%
dplyr::group_by(!!!grp_vars) %>%
dplyr::summarise(
passing_epa = sum(.data$qb_epa, na.rm = TRUE),
# mean will return NaN if all values are NA, because we remove NA
passing_cpoe = if (any(!is.na(.data$cpoe))) mean(.data$cpoe, na.rm = TRUE) else NA_real_
) %>%
dplyr::ungroup()

All other stats can either be summed or calculated from the summed stats.

Performing the complete calculation for all available seasons is extremely inefficient and can lead to memory problems on some computers. I will therefore not implement this.