JaseZiv/worldfootballR

I found bug in this function <fb_league_stats>

Closed this issue · 7 comments

I found bug in this function <fb_league_stats>

Example Code
test <-fb_league_stats( country ="ENG", gender ="M", season_end_year = 2023, tier = "1st", stat_type = "passing_types", team_or_player ="player", time_pause = 3, rate = purrr::rate_backoff(max_times = 3) )

Screenshot 2566-08-22 at 22 58 28

I don't think this is really a bug. This function is "experimental" and can be highly dependent on

  1. your internet connection
  2. FBRef's server latency

On the backend, we use chromote and have to wait for the table to be loaded via JavaScript on the page. this is the only function like this in the package AFAIK.

for transparency, i basically copied over a lot of the rvest / promise for fb_league_stats() (see worldfootballr_chromote_session()) from this branch in the {rvest} package. that branch has been in draft state for months, seemingly because it's never completely worked properly

I can also reproduce it today, although yesterday, there was not issue.
Strangely, the function works for team but doesn't work for player. Even with the stat_type such as keepers, where the tables are of comparable sizes.

For team:

fb_league_stats(country = "ENG", gender = "M", season_end_year = 2024, tier = "1st", non_dom_league_url = NA,
  stat_type = "keepers",
  team_or_player = "team"
)
# A tibble: 20 × 23
   Team_or_Opponent Squad  Num_Players `MP_Playing Time` `Starts_Playing Time` `Min_Playing Time` Mins_Per_90_Playing …¹
   <chr>            <chr>        <int>             <int>                 <int>              <dbl>                  <dbl>
 1 team             Arsen…           2                15                    15               1350                     15
 2 team             Aston…           2                15                    15               1350                     15
........................

For player:

fb_league_stats(country = "ENG", gender = "M", season_end_year = 2024, tier = "1st", non_dom_league_url = NA,
  stat_type = "keepers",
  team_or_player = "player"
)
Error: Request failed after 3 attempts.
# A tibble: 0 × 1
# ℹ 1 variable: url <chr>

I've intentionally left this issue open to give notice that we're aware that this function is unreliable. I'm not sure there's a great solution without using Selenium, which we've implicitly decided to not use, so as to reduce the scope of this package.

I think the reason why the function sometimes works but sometimes doesn't is due to server latency on FBref's end, but I'm not entirely sure. The function relies on {promises} for async behavior, which can be very dependent on Internet connection and server response time.

Thanks for the answer @tonyelhabr!
One more note, that may help in solving this issue. I found a workaround by using fb_big5_advanced_season_stats(season_end_year= c(2024), stat_type= "shooting", team_or_player= "player") instead and filtering by the league name after. The fact that the fb_big5_advanced_season_stats is working while fb_league_stats isn't might indicate that this is not the server latency issue. However, this is only my guess 😃

Thanks for the answer @tonyelhabr! One more note, that may help in solving this issue. I found a workaround by using fb_big5_advanced_season_stats(season_end_year= c(2024), stat_type= "shooting", team_or_player= "player") instead and filtering by the league name after. The fact that the fb_big5_advanced_season_stats is working while fb_league_stats isn't might indicate that this is not the server latency issue. However, this is only my guess 😃

i'm glad that works for you! that outcome is not unexpected to me. fb_big5_advanced_season_stats() doesn't require any special backend logic (i.e. promises) because FBref loads that data on the server-side. The individual league player stats are loaded on the client side (in the browser), so they can't be scraped in the typical manner (i.e. with rvest::read_html()).

Issue is being archived.