nflverse/nflfastR

[BUG] "number of items to replace is not a multiple of replacement length"

dennisbrookner opened this issue · 5 comments

Is there an existing issue for this?

  • I have searched the existing issues

Have you installed the latest development version of the package(s) in question?

  • I have installed the latest development version of the package.

What version of the package do you have?

4.5.1.9013

Describe the bug

Calling update_db() throws the following error message:

> update_db()
── Update nflfastR Play-by-Play Database ─────────────────────────────────────── nflfastR version 4.5.1.9013 ──
• 17:46:33 | Checking for missing completed games...
ℹ 17:46:34 | You have 6435 games and are missing 8.
• 17:46:34 | Start download of 8 games...
ℹ It is recommended to use parallel processing when trying to load multiple games.Please consider running `future::plan("multisession")`! Will go on sequentially...
✔ 17:46:39 | Download finished. Adding variables...
✔ 17:46:39 | added game variables
✔ 17:46:39 | added nflscrapR variables
Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp : 
  number of items to replace is not a multiple of replacement length
> 

This is true for:

  • update_db()
  • update_db(force_rebuild=2023)
  • update_db(force_rebuild=TRUE)
  • Deleting my pbp_db entirely and then running update_db()

Reprex

r
nflfastR::update_db()
#> ── Update nflfastR Play-by-Play Database ──────── nflfastR version 4.5.1.9013 ──
#> ℹ 17:51:21 | Can't find the data table "nflfastR_pbp"
#> in your database. Will load the play by play data from
#> scratch.
#> 
#> • 17:51:21 | Starting download of 25 seasons between 1999 and 2023...
#> 
#> • 17:54:38 | Checking for missing completed games...
#> 
#> ℹ 17:54:41 | You have 6435 games and are missing 8.
#> 
#> • 17:54:41 | Start download of 8 games...
#> 
#> ℹ It is recommended to use parallel processing when trying to load multiple games.Please consider running `future::plan("multisession")`! Will go on sequentially...
#> 
#> ✔ 17:54:48 | Download finished. Adding variables...
#> 
#> ✔ 17:54:48 | added game variables
#> 
#> ✔ 17:54:48 | added nflscrapR variables
#> [17:54:48] WARNING: amalgamation/../src/learner.cc:438: 
#>   If you are loading a serialized model (like pickle in Python, RDS in R) generated by
#>   older XGBoost, please export the model by calling `Booster.save_model` from that version
#>   first, then load it back in current version. See:
#> 
#>     https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
#> 
#>   for more details about differences between saving model and serializing.
#> Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp: number of items to replace is not a multiple of replacement length

Created on 2023-09-17 with reprex v2.0.2



### Expected Behavior

`update_db()` should run and grab the latest play-by-play data, rebuilding when requested.

### nflverse_sitrep

```r
nflverse_sitrep()
── System Info ────────────────────────────────────────────────────────────────────────────────────────────────
• R version 4.2.1 (2022-06-23) • Running under: macOS Ventura 13.1
── Package Status ─────────────────────────────────────────────────────────────────────────────────────────────
   package  installed  cran        dev behind
1   nfl4th      1.0.4 1.0.4 1.0.4.9000    dev
2 nflfastR 4.5.1.9013 4.5.1 4.5.1.9013       
3 nflplotR      1.1.0 1.1.0 1.1.0.9006    dev
4 nflreadr      1.4.0 1.4.0   1.4.0.03    dev
5 nflseedR      1.2.0 1.2.0      1.2.0       
6 nflverse      1.0.3 1.0.3      1.0.3       
── Package Options ────────────────────────────────────────────────────────────────────────────────────────────
• No options set for above packages
── Package Dependencies ───────────────────────────────────────────────────────────────────────────────────────
• askpass     (1.1)     • gsubfn     (0.7)       • proto        (1.0.0)    
• backports   (1.4.1)   • gtable     (0.3.1)     • purrr        (0.3.5)    
• cachem      (1.0.6)   • httr       (1.4.4)     • R6           (2.5.1)    
• cli         (3.4.1)   • isoband    (0.2.6)     • rappdirs     (0.3.3)    
• codetools   (0.2-18)  • janitor    (2.1.0)     • RColorBrewer (1.1-3)    
• colorspace  (2.0-3)   • jsonlite   (1.8.2)     • Rcpp         (1.0.9)    
• compiler    (4.2.1)   • labeling   (0.4.2)     • rlang        (1.0.6)    
• cpp11       (0.4.3)   • lattice    (0.20-45)   • rstudioapi   (0.14)     
• crayon      (1.5.2)   • lifecycle  (1.0.3)     • scales       (1.2.1)    
• curl        (4.3.3)   • listenv    (0.8.0)     • snakecase    (0.11.0)   
• data.table  (1.14.4)  • lubridate  (1.8.0)     • splines      (4.2.1)    
• digest      (0.6.30)  • magick     (2.7.3)     • stats        (4.2.1)    
• dplyr       (1.0.10)  • magrittr   (2.0.3)     • stringi      (1.7.8)    
• ellipsis    (0.3.2)   • MASS       (7.3-58.1)  • stringr      (1.4.1)    
• fansi       (1.0.3)   • Matrix     (1.5-1)     • sys          (3.4.1)    
• farver      (2.1.1)   • memoise    (2.0.1)     • tibble       (3.1.8)    
• fastmap     (1.1.0)   • methods    (4.2.1)     • tidyr        (1.2.1)    
• fastrmodels (1.0.2)   • mgcv       (1.8-40)    • tidyselect   (1.2.0)    
• furrr       (0.3.1)   • mime       (0.12)      • tools        (4.2.1)    
• future      (1.28.0)  • munsell    (0.5.0)     • utf8         (1.2.2)    
• generics    (0.1.3)   • nlme       (3.1-160)   • utils        (4.2.1)    
• ggplot2     (3.3.6)   • openssl    (2.0.4)     • vctrs        (0.4.2)    
• globals     (0.16.1)  • parallel   (4.2.1)     • viridisLite  (0.4.1)    
• glue        (1.6.2)   • parallelly (1.32.1)    • withr        (2.5.0)    
• graphics    (4.2.1)   • pillar     (1.8.1)     • xgboost      (1.6.0.1)  
• grDevices   (4.2.1)   • pkgconfig  (2.0.3)       
• grid        (4.2.1)   • progressr  (0.11.0)      
───────────────────────────────────────────────────────────────────────────────────────────────────────────────
>

Screenshots

No response

Additional context

Full error traceback (originates from a slightly nested call, but analogous to the above)

Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp :
number of items to replace is not a multiple of replacement length
67.
predict.gam(object, newdata = newdata, type = type, se.fit = se.fit,
terms = terms, exclude = exclude, block.size = block.size,
newdata.guaranteed = newdata.guaranteed, na.action = na.action,
...)
66.
mgcv::predict.bam(fastrmodels::fg_model, newdata = pbp_data,
type = "response")
65.
add_ep_variables(.)
64.
pbp %>% add_ep_variables()
63.
add_ep(.)
62.
dplyr::filter(., !is.na(.data$air_yards))
61.
pbp %>% dplyr::filter(!is.na(.data$air_yards))
60.
nrow(pbp %>% dplyr::filter(!is.na(.data$air_yards)))
59.
add_air_yac_ep(.)
58.
nrow(pbp_data)
57.
add_wp_variables(.)
56.
pbp %>% add_wp_variables()
55.
add_wp(.)
54.
dplyr::filter(., !is.na(.data$air_yards))
53.
pbp %>% dplyr::filter(!is.na(.data$air_yards))
52.
nrow(pbp %>% dplyr::filter(!is.na(.data$air_yards)))
51.
add_air_yac_wp(.)
50.
dplyr::mutate(., receiver_player_name = stringr::str_extract(.data$desc,
"(?<=((to)|(for))\\s[:digit:]{0,2}\\-{0,1})[A-Z][A-z]*\\.\\s?[A-Z][A-z]+(\\s(I{2,3})|(IV))?"),
pass_middle = dplyr::if_else(.data$pass_location == "middle",
1, 0), air_is_zero = dplyr::if_else(.data$air_yards == ...
49.
dplyr::select(., "complete_pass", "air_yards", "yardline_100",
"ydstogo", "down1", "down2", "down3", "down4", "air_is_zero",
"pass_middle", "era2", "era3", "era4", "qb_hit", "home",
"outdoors", "retractable", "dome", "distance_to_sticks", ...
48.
pbp %>% dplyr::mutate(receiver_player_name = stringr::str_extract(.data$desc,
"(?<=((to)|(for))\\s[:digit:]{0,2}\\-{0,1})[A-Z][A-z]*\\.\\s?[A-Z][A-z]+(\\s(I{2,3})|(IV))?"),
pass_middle = dplyr::if_else(.data$pass_location == "middle",
1, 0), air_is_zero = dplyr::if_else(.data$air_yards == ...
47.
prepare_cp_data(pbp)
46.
add_cp(.)
45.
dplyr::mutate(., old_posteam = .data$posteam, posteam = dplyr::case_when(.data$kickoff_attempt ==
1 & (.data$own_kickoff_recovery == 1 | .data$fumble_lost ==
1) ~ .data$defteam, stringr::str_detect(.data$desc, kickoff_finder) &
.data$own_kickoff_recovery == 0 & dplyr::lead(.data$own_kickoff_recovery == ...
44.
dplyr::group_by(., .data$game_id, .data$game_half)
43.
dplyr::mutate(., row = 1:dplyr::n(), new_drive = dplyr::if_else(.data$posteam !=
dplyr::lag(.data$posteam) | (.data$posteam != dplyr::lag(.data$posteam,
2) & is.na(dplyr::lag(.data$posteam))) | (.data$posteam !=
dplyr::lag(.data$posteam, 3) & is.na(dplyr::lag(.data$posteam, ...
42.
dplyr::group_by(., .data$game_id)
41.
dplyr::mutate(., fixed_drive = cumsum(.data$new_drive), tmp_result = dplyr::case_when(.data$touchdown ==
1 & .data$posteam == .data$td_team ~ "Touchdown", .data$touchdown ==
1 & .data$posteam != .data$td_team ~ "Opp touchdown", .data$field_goal_result ==
"made" ~ "Field goal", .data$field_goal_result %in% c("blocked", ...
40.
dplyr::group_by(., .data$game_id, .data$fixed_drive)
39.
dplyr::mutate(., fixed_drive_result = dplyr::if_else(dplyr::last(stats::na.omit(.data$tmp_result)) ==
"End of half", dplyr::first(stats::na.omit(.data$tmp_result)),
dplyr::last(stats::na.omit(.data$tmp_result))))
38.
dplyr::ungroup(.)
37.
dplyr::mutate(., posteam = .data$old_posteam)
36.
dplyr::select(., -"row", -"new_drive", -"tmp_result", -"old_posteam")
35.
d %>% dplyr::mutate(old_posteam = .data$posteam, posteam = dplyr::case_when(.data$kickoff_attempt ==
1 & (.data$own_kickoff_recovery == 1 | .data$fumble_lost ==
1) ~ .data$defteam, stringr::str_detect(.data$desc, kickoff_finder) &
.data$own_kickoff_recovery == 0 & dplyr::lead(.data$own_kickoff_recovery == ...
34.
add_drive_results(.)
33.
dplyr::mutate(., old_posteam = .data$posteam, posteam = dplyr::case_when(.data$kickoff_attempt ==
1 & (.data$own_kickoff_recovery == 1 | .data$fumble_lost ==
1) ~ .data$defteam, stringr::str_detect(.data$desc, kickoff_finder) &
.data$own_kickoff_recovery == 0 & dplyr::lead(.data$own_kickoff_recovery == ...
32.
dplyr::group_by(., .data$game_id, .data$game_half)
31.
dplyr::mutate(., row = 1:dplyr::n(), new_series = dplyr::if_else(.data$fixed_drive !=
dplyr::lag(.data$fixed_drive) | ((dplyr::lag(.data$first_down_rush) ==
1 | dplyr::lag(.data$first_down_pass) == 1 | dplyr::lag(.data$first_down_penalty) ==
1) & dplyr::lag(.data$touchdown) == 0) | .data$row == 1, ...
30.
dplyr::group_by(., .data$game_id)
29.
dplyr::mutate(., series = cumsum(.data$new_series), tmp_result = dplyr::case_when((.data$first_down_penalty ==
1 | .data$first_down_rush == 1 | .data$first_down_pass ==
1) & touchdown == 0 ~ "First down", .data$touchdown == 1 &
.data$posteam == .data$td_team ~ "Touchdown", .data$touchdown == ...
28.
dplyr::group_by(., .data$game_id, .data$series)
27.
dplyr::mutate(., series_result = dplyr::if_else(dplyr::last(stats::na.omit(.data$tmp_result)) ==
"End of half", dplyr::first(stats::na.omit(.data$tmp_result)),
dplyr::last(stats::na.omit(.data$tmp_result))), series_success = dplyr::if_else(.data$series_result %in%
c("Touchdown", "First down"), 1, 0))
26.
dplyr::ungroup(.)
25.
dplyr::mutate(., posteam = .data$old_posteam)
24.
dplyr::select(., -"row", -"tmp_result", -"new_series", -"old_posteam")
23.
pbp %>% dplyr::mutate(old_posteam = .data$posteam, posteam = dplyr::case_when(.data$kickoff_attempt ==
1 & (.data$own_kickoff_recovery == 1 | .data$fumble_lost ==
1) ~ .data$defteam, stringr::str_detect(.data$desc, kickoff_finder) &
.data$own_kickoff_recovery == 0 & dplyr::lead(.data$own_kickoff_recovery == ...
22.
add_series_data(.)
21.
dplyr::select(., tidyselect::any_of(c(nflscrapr_cols, new_cols,
api_cols)))
20.
pbp %>% dplyr::select(tidyselect::any_of(c(nflscrapr_cols, new_cols,
api_cols)))
19.
withCallingHandlers(expr, warning = function(w) if (inherits(w,
classes)) tryInvokeRestart("muffleWarning"))
18.
suppressWarnings(out <- pbp %>% dplyr::select(tidyselect::any_of(c(nflscrapr_cols,
new_cols, api_cols))))
17.
select_variables(.)
16.
pbp %>% add_game_data(...) %>% add_nflscrapr_mutations() %>%
add_ep() %>% add_air_yac_ep() %>% add_wp() %>% add_air_yac_wp() %>%
add_cp() %>% add_drive_results() %>% add_series_data() %>%
select_variables()
15.
withCallingHandlers(expr, warning = function(w) if (inherits(w,
classes)) tryInvokeRestart("muffleWarning"))
14.
suppressWarnings({
p <- progressr::progressor(along = game_ids)
pbp <- furrr::future_map_dfr(game_ids, function(x, p, dir,
...) { ...
13.
fast_scraper(game_ids = game_ids, dir = dir, ..., in_builder = builder)
12.
nrow(pbp)
11.
clean_pbp(., in_builder = builder)
10.
nrow(pbp)
9.
add_qb_epa(., in_builder = builder)
8.
nrow(pbp)
7.
add_xyac(., in_builder = builder)
6.
nrow(pbp)
5.
add_xpass(., in_builder = builder)
4.
fast_scraper(game_ids = game_ids, dir = dir, ..., in_builder = builder) %>%
clean_pbp(in_builder = builder) %>% add_qb_epa(in_builder = builder) %>%
add_xyac(in_builder = builder) %>% add_xpass(in_builder = builder)
3.
build_nflfastR_pbp(missing, rules = FALSE)
2.
update_db(force_rebuild = force_rebuild) at puntr_extras.R#37
1.
get_punts(years = 2021:2023, include_blocks = TRUE, seasontype = "REG")

Oh also! Earlier in the afternoon, I was getting this error along with a message that the KC JAX game from today wasn't yet available. I'm not seeing that anymore, but it leads me to believe that different data sources aren't agreeing about what games have finished.

Please try updating nflreadr to dev version and trying again

I'm confused about this, because I believe that I did that:

> nflverse::nflverse_update(devel = TRUE)
ℹ The following packages are out of date:
• nfl4th   (1.0.4 -> 1.0.4.9000)
• nflplotR (1.1.0 -> 1.1.0.9006)
• nflreadr (1.4.0 -> 1.4.0.3   )

but isn't the update_db() function part of nflfastR? Do I need to update that too? Or am I using the wrong function?

EDIT: Is the above call not actually updating anything, just checking versions?

Yay that worked, thanks!! That's my bad, I just skimmed the output of nflverse::nflverse_update(devel = TRUE) and didn't realize that things were out of date!

Glad it worked!