JaseZiv/worldfootballR

fb_league_stats won't return the tables when the stats are hidden

theblindshadowyt opened this issue · 3 comments

I have some issues using the function fb_league_stats when trying to get the stats from the Argentinian Firist Division 2022

Basically the stats at player level are present in the site, but are hidden behind a button that says "Show player 'stat' "

example link: https://fbref.com/en/comps/21/2022/shooting/2022-Primera-Division-Stats

This doesn't happen with other seasons. Season 2022 is the only one with this hide/show button

In R:

library(worldfootballR)
packageVersion("worldfootballR")

data = fb_league_stats(
  country = "ARG",
  gender = "M",
  season_end_year = 2022,
  tier = "1st",
  non_dom_league_url = NA,
  stat_type = "shooting",
  team_or_player = "player"
)

sessionInfo()

The error i'm getting is: In f(...) :
Did not find the expected number of tables on the page (3). Found 2.

It looks as though this is in fact only a problem for some previous season.

Would this be an easy fix @tonyelhabr?

Ugh, this is tricky. I tried "clicking" the show/hide button via chromote, but that didn't seem to work.

Another solution would be to parse out the table from an HTML comment that is stored alongside the show/hide element.

image

In fact, parsing out the table embedded in the comment should be a generalizable solution (which is preferable, since I'd rather not have very hacky solutions in the code-base), but I'm a little stuck at the moment with getting the column names in the right format.

Here's the code I've got at this point. This is mostly the same code already used by the fb_league_stats function.

url <- "https://fbref.com/en/comps/21/2022/shooting/2022-Primera-Division-Stats"
fi <- purrr::insistently(worldfootballR:::worldfootballr_chromote_session)
session <- fi(url)
## find element "above" commented out table
node_idx1 <- session$find_nodes("#stats_shooting_sh")
## find element "below" commented out table
node_idx2 <- session$find_nodes("#stats_shooting_control")
## find commented out element in-between
node_idx <- round((node_idx1 + node_idx2) / 2)
elements <- session$session$Runtime$callFunctionOn("function() { return this.textContent }", session$object_id(node_idx))
html <- paste0("<html>", paste0(elements, collapse = "\n"), "</html>")
player_table <- xml2::read_html(html)
session$session$close(wait_ = FALSE)
player_table_elements <- xml2::xml_children(xml2::xml_children(player_table))
parsed_player_table <- rvest::html_table(player_table_elements)
parsed_player_table[[2]][1, ] |> t()
[,1]                                                                                                                                                                                                                                                                                                                                                                                                         
         "RankThis is a count of the rows from top to bottom.It is recalculated following the sorting of a column.\\\" >Rk"                                                                                                                                                                                                                                                                                           
Standard "Player"                                                                                                                                                                                                                                                                                                                                                                                                     
Expected "First, we check our records in international play at senior level.Then youth level.Then citizenship presented on wikipedia.Finally, we use their birthplace when available.\\\" >Nation"                                                                                                                                                                                                                    
         "PositionPosition most commonly played by the playerGK - GoalkeepersDF - DefendersMF - MidfieldersFW - ForwardsFB - FullbacksLB - Left BacksRB - Right BacksCB - Center BacksDM - Defensive MidfieldersCM - Central MidfieldersLM - Left MidfieldersRM - Right MidfieldersWM - Wide MidfieldersLW - Left WingersRW - Right WingersAM - Attacking Midfielders\\\" >Pos"

Depending on how urgent this is @theblindshadowyt, you may have better luck reaching out to FBref and asking them why the table is hidden specifically on this page. They may realize it's a bug and unhide it, which solves your problem without any code changes. (I can't seem to find other league-seasons with this same behavior, so I suspect this may be a bug of some kind.)