[BUG] Computed WP != Archived WP - Appears OT related.
andrewtek opened this issue · 1 comments
Is there an existing issue for this?
- I have searched the existing issues
Have you installed the latest development version of the package(s) in question?
- I have installed the latest development version of the package.
If this is a data issue, have you tried clearing your nflverse cache?
I have cleared my nflverse cache and the issue persists.
What version of the package do you have?
nflfastR 4.6.0 4.6.0 4.6.0.9000 dev
Describe the bug
For 48 plays so far in 2023, I have identified that WP computed by nflfastR::build_nflfastR_pbp(game_id) does not match the WP archived in nflfastR::load_pbp(2023).
When this occurs, it appears to apply to games with OT. For instance:
game_id qtr drive play_id play_type wp_archived wp_computed
1: 2023_01_BUF_NYJ 4 21 3882 0.5037537 0.3967606
2: 2023_01_BUF_NYJ 5 22 3902 kickoff 0.5037537 0.3967606
3: 2023_01_BUF_NYJ 5 22 3918 no_play 0.5037537 0.3967606
4: 2023_01_BUF_NYJ 5 22 3942 pass 0.4389834 0.3538505
5: 2023_01_BUF_NYJ 5 22 3965 run 0.3915674 0.3144271
6: 2023_01_BUF_NYJ 5 22 3987 pass 0.3318032 0.2638814
7: 2023_01_BUF_NYJ 5 22 4010 punt 0.2552552 0.2038933
8: 2023_02_LAC_TEN 4 NA 3966 0.5037537 0.3967606
9: 2023_02_LAC_TEN 5 21 3985 kickoff 0.5037537 0.3967606
10: 2023_02_LAC_TEN 5 21 4001 pass 0.5037537 0.3967606
11: 2023_02_LAC_TEN 5 21 4024 pass 0.4562042 0.3605613
12: 2023_02_LAC_TEN 5 21 4047 pass 0.3898785 0.3070003
13: 2023_02_LAC_TEN 5 21 4070 punt 0.2804652 0.2217094
In the REPREX is a short script that takes a few minutes to run. It compares the archived WP for every 2023 play against the computed WP. Any plays that mismatch within 6 digits are reported in the output.
Reprex
library(dplyr)
#clear cache
nflreadr::.clear_cache()
#load 2023 season directly from archive
pbp <- nflfastR::load_pbp(2023)
#get unique game_ids
game_ids <- unique(pbp$game_id)
# process game ids looking for mismatches
mismatch_dfs <- lapply(game_ids, function(game_id) {
#output
nflfastR:::user_message(paste0("Processing game ", game_id, "."), "todo")
#get subset of plays for the game specified
archived <- filter(pbp, game_id == game_id)
#compute value without filling output
suppressMessages({
computed <- nflfastR::build_nflfastR_pbp(game_id) %>%
as.data.frame()
})
#merge the two dataframes on common columns
merged_df <- merge(archived, computed, by = c("game_id", "qtr", "drive", "play_id", "play_type"), suffixes = c("_archived", "_computed"))
#subset where 'wp' values are different
result <- subset(merged_df, round(wp_archived, 6) != round(wp_computed, 6))
#output status
if (nrow(result) > 0) {
nflfastR:::user_message(paste0("Processing game ", game_id, " has mismatches."), "info")
}else{
nflfastR:::user_message(paste0("Processing game ", game_id, "."), "done")
}
#return dataframe
dplyr::select(result, "game_id", "qtr", "drive", "play_id", "play_type", "wp_archived", "wp_computed")
})
#combining the mismatched dfs
combined_results <- do.call(rbind, mismatch_dfs)
#output
print(combined_results, n = Inf)
Expected Behavior
I would expect the archived value to match the computed value.
nflverse_sitrep
> nflreadr::nflverse_sitrep()
── System Info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• R version 4.3.2 (2023-10-31 ucrt) • Running under: Windows 11 x64 (build 22621)
── Package Status ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package installed cran dev behind
1 nflfastR 4.6.0 4.6.0 4.6.0.9000 dev
2 nflreadr 1.4.0 1.4.0 1.4.0.09 dev
── Package Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────
• No options set for above packages
── Package Dependencies ──────────────────────────────────────────────────────────────────────────────────────────────────────────
• cachem (1.0.8) • listenv (0.9.0) • utf8 (1.2.3)
• cli (3.6.1) • lubridate (1.9.3) • vctrs (0.6.3)
• cpp11 (0.4.6) • magrittr (2.0.3) • withr (2.5.2)
• curl (5.1.0) • memoise (2.0.1) • xgboost (1.7.5.1)
• data.table (1.14.8) • parallelly (1.36.0) • codetools (0.2-19)
• digest (0.6.33) • pillar (1.9.0) • compiler (4.3.2)
• dplyr (1.1.3) • pkgconfig (2.0.3) • graphics (4.3.2)
• fansi (1.0.4) • progressr (0.14.0) • grDevices (4.3.2)
• fastmap (1.1.1) • purrr (1.0.2) • grid (4.3.2)
• fastrmodels (1.0.2) • R6 (2.5.1) • lattice (0.21-9)
• furrr (0.3.1) • rappdirs (0.3.3) • Matrix (1.6-1.1)
• future (1.33.0) • rlang (1.1.1) • methods (4.3.2)
• generics (0.1.3) • snakecase (0.11.1) • mgcv (1.9-0)
• globals (0.16.2) • stringi (1.8.1) • nlme (3.1-163)
• glue (1.6.2) • stringr (1.5.1) • parallel (4.3.2)
• hms (1.1.3) • tibble (3.2.1) • splines (4.3.2)
• janitor (2.2.0) • tidyr (1.3.0) • stats (4.3.2)
• jsonlite (1.8.7) • tidyselect (1.2.0) • tools (4.3.2)
• lifecycle (1.0.4) • timechange (0.2.0) • utils (4.3.2)
── Not Installed ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• nflseedR • nflplotR
• nfl4th • nflverse
Screenshots
No response
Additional context
No response