JaseZiv/worldfootballR

Feature Request: FBref stats for all players in league from one page

mimburgi opened this issue · 3 comments

It would be helpful if it was possible to return stats from all players in a given league without having to scrape each team, similar to how it works for the big five.

At the least, I think this would reduce the scraping load on FBref by scraping a single page for this purpose rather than each team's page, which I think is what happens when you pass all of a league's team urls to the current function. From a package user standpoint this eliminates the need to first generate team urls for the league, but the main reason for the suggestion is to reduce server load while scraping to avoid timeouts when scraping several leagues at once, and my guess is this also might speed up the function.

For example, Eredivise player shooting stats could be scraped from the player shooting tables in these pages:
https://fbref.com/en/comps/23/shooting/Eredivisie-Stats
https://fbref.com/en/comps/23/2021-2022/shooting/2021-2022-Eredivisie-Stats

Great suggestion!

The last time I checked (about two weeks ago), this data for all leagues other than the big five wasn't able to be scraped without browser automation/selenium. This doesn't package up nicely, so haven't been able to solve it.

I wonder if we'll be able to solve this with the chromote branch of the rvest package. Seems like there is a good amount of work on it, so I imagine it will eventually make its way into the main branch of rvest

Hey @mimburgi, we've added the fb_league_stats() function to pull this data. You should be prompted to install {chromote} and {R6} if you don't already have them.

# remotes::install_github("JaseZiv/worldfootballR")
library(worldfootballR)
packageVersion("worldfootballR")
#> [1] '0.6.2.6000'
library(dplyr)

eredivisie_player_shooting_2022 <- fb_league_stats(
  country = "NED",
  gender = "M",
  season_end_year = 2022,
  tier = "1st",
  non_dom_league_url = NA,
  stat_type = "shooting",
  team_or_player = "player"
)
glimpse(eredivisie_player_shooting_2022)
#> Rows: 533
#> Columns: 27
#> $ Rk                       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14…
#> $ Player                   <chr> "Trustin van 't Loo", "Dirk Abels", "Zakaria …
#> $ Nation                   <chr> "nl NED", "nl NED", "ma MAR", "se SWE", "be B…
#> $ Pos                      <chr> "MF", "DF,MF", "FW,MF", "FW,MF", "DF", "FW,MF…
#> $ Squad                    <chr> "Heerenveen", "Sparta R'dam", "AZ Alkmaar", "…
#> $ Age                      <int> 17, 24, 21, 19, 21, 22, 27, 19, 21, 22, 23, 2…
#> $ Born                     <int> 2004, 1997, 2000, 2002, 2000, 1998, 1994, 200…
#> $ Mins_Per_90              <dbl> 0.3, 32.2, 9.1, 15.0, 19.1, 8.1, 2.2, 17.4, 1…
#> $ Gls                      <int> 0, 0, 4, 0, 0, 0, 1, 6, 0, 0, 5, 0, 0, 0, 1, …
#> $ Sh_Standard              <int> 1, 14, 38, 24, 4, 19, 6, 31, 3, 7, 42, 4, 15,…
#> $ SoT_Standard             <int> 0, 4, 14, 8, 1, 4, 3, 15, 1, 1, 13, 1, 3, 1, …
#> $ SoT_percent_Standard     <dbl> 0.0, 28.6, 36.8, 33.3, 25.0, 21.1, 50.0, 48.4…
#> $ Sh_per_90_Standard       <dbl> 3.75, 0.43, 4.20, 1.60, 0.21, 2.36, 2.71, 1.7…
#> $ SoT_per_90_Standard      <dbl> 0.00, 0.12, 1.55, 0.53, 0.05, 0.50, 1.36, 0.8…
#> $ G_per_Sh_Standard        <dbl> 0.00, 0.00, 0.11, 0.00, 0.00, 0.00, 0.17, 0.1…
#> $ G_per_SoT_Standard       <dbl> NA, 0.00, 0.29, 0.00, 0.00, 0.00, 0.33, 0.40,…
#> $ Dist_Standard            <dbl> 16.1, 21.2, 14.5, 16.7, 12.4, 18.7, 12.4, 16.…
#> $ FK_Standard              <int> 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ PK                       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ PKatt                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ xG_Expected              <dbl> 0.1, 0.8, 4.6, 2.0, 0.3, 2.2, 0.9, 4.1, 0.7, …
#> $ npxG_Expected            <dbl> 0.1, 0.8, 4.6, 2.0, 0.3, 2.2, 0.9, 4.1, 0.7, …
#> $ npxG_per_Sh_Expected     <dbl> 0.10, 0.06, 0.12, 0.08, 0.07, 0.13, 0.14, 0.1…
#> $ G_minus_xG_Expected      <dbl> -0.1, -0.8, -0.6, -2.0, -0.3, -2.2, 0.1, 1.9,…
#> $ `np:G_minus_xG_Expected` <dbl> -0.1, -0.8, -0.6, -2.0, -0.3, -2.2, 0.1, 1.9,…
#> $ Matches                  <chr> "Matches", "Matches", "Matches", "Matches", "…
#> $ url                      <chr> "https://fbref.com/en/comps/23/2021-2022/shoo…

eredivisie_player_shooting_2023 <- fb_league_stats(
  country = "NED",
  gender = "M",
  season_end_year = 2023,
  tier = "1st",
  non_dom_league_url = NA,
  stat_type = "shooting",
  team_or_player = "player"
)
glimpse(eredivisie_player_shooting_2023)
#> Rows: 441
#> Columns: 27
#> $ Rk                       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14…
#> $ Player                   <chr> "Dirk Abels", "Paulos Abraham", "Bobby Adekan…
#> $ Nation                   <chr> "nl NED", "se SWE", "nl NED", "be BEL", "nl N…
#> $ Pos                      <chr> "DF", "FW,MF", "FW,MF", "DF", "FW", "MF,FW", …
#> $ Squad                    <chr> "Sparta R'dam", "Groningen", "Go Ahead Eag", …
#> $ Age                      <chr> "25-216", "20-183", "23-335", "22-200", "20-3…
#> $ Born                     <int> 1997, 2002, 1999, 2000, 2002, 2000, 1994, 200…
#> $ Mins_Per_90              <dbl> 5.3, 5.4, 9.5, 16.0, 0.0, 0.2, 7.9, 0.1, 12.0…
#> $ Gls                      <int> 1, 0, 4, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
#> $ Sh_Standard              <int> 4, 11, 22, 9, 0, 2, 4, 0, 27, 5, 2, 1, 5, 0, …
#> $ SoT_Standard             <int> 1, 5, 8, 2, 0, 1, 1, 0, 11, 1, 0, 0, 1, 0, 1,…
#> $ SoT_percent_Standard     <dbl> 25.0, 45.5, 36.4, 22.2, NA, 50.0, 25.0, NA, 4…
#> $ Sh_per_90_Standard       <dbl> 0.76, 2.02, 2.31, 0.56, 0.00, 12.86, 0.50, 0.…
#> $ SoT_per_90_Standard      <dbl> 0.19, 0.92, 0.84, 0.12, 0.00, 6.43, 0.13, 0.0…
#> $ G_per_Sh_Standard        <dbl> 0.25, 0.00, 0.18, 0.11, NA, 0.50, 0.00, NA, 0…
#> $ G_per_SoT_Standard       <dbl> 1.0, 0.0, 0.5, 0.5, NA, 1.0, 0.0, NA, 0.0, 0.…
#> $ Dist_Standard            <dbl> 26.4, 17.3, 17.3, 9.0, NA, 9.2, 15.7, NA, 15.…
#> $ FK_Standard              <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ PK                       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ PKatt                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ xG_Expected              <dbl> 0.1, 1.1, 2.2, 1.4, 0.0, 0.2, 0.5, 0.0, 1.7, …
#> $ npxG_Expected            <dbl> 0.1, 1.1, 2.2, 1.4, 0.0, 0.2, 0.5, 0.0, 1.7, …
#> $ npxG_per_Sh_Expected     <dbl> 0.03, 0.10, 0.10, 0.15, NA, 0.10, 0.13, NA, 0…
#> $ G_minus_xG_Expected      <dbl> 0.9, -1.1, 1.8, -0.4, 0.0, 0.8, -0.5, 0.0, -1…
#> $ `np:G_minus_xG_Expected` <dbl> 0.9, -1.1, 1.8, -0.4, 0.0, 0.8, -0.5, 0.0, -1…
#> $ Matches                  <chr> "Matches", "Matches", "Matches", "Matches", "…
#> $ url                      <chr> "https://fbref.com/en/comps/23/shooting/Eredi…

Note that the function currently doesn't do any explicit type inference, so it would fail if you tried pulling both 2021/22 and 2022/23 at the same time, i.e. season_end_year = 2022:2023 because the Age column is an integer in the former and a character in the latter.

Also, we've observed that this function will sometimes return 0 results. I think this has to do with the finnicky nature of promises and the in development {chromote} package. If it doesn't work on the first try, I'd just retry it. If it still doesn't work after several tries, then ok, it might be a bug 🤷