fb_match_results returning NA for goals
DDE1989 opened this issue · 19 comments
I just updated my worldfootballR to the latest version and when I use the fb_match_results function I get values for all columns including expected goals, but home goals and away goals are returning NA
library(worldfootballR)
packageVersion("worldfootballR") #‘0.6.4.4’
test.results <- fb_match_results(country = "ENG", gender = "M", season_end_year = 2023, tier = "1st")
head(test.results)
# Competition_Name Gender Country Season_End_Year Round Wk Day Date Time Home HomeGoals
# 1 Premier League M ENG 2023 NA 1 Fri 2022-08-05 20:00 Crystal Palace NA
# 2 Premier League M ENG 2023 NA 1 Sat 2022-08-06 12:30 Fulham NA
# 3 Premier League M ENG 2023 NA 1 Sat 2022-08-06 15:00 Tottenham NA
# 4 Premier League M ENG 2023 NA 1 Sat 2022-08-06 15:00 Newcastle Utd NA
# 5 Premier League M ENG 2023 NA 1 Sat 2022-08-06 15:00 Leeds United NA
# 6 Premier League M ENG 2023 NA 1 Sat 2022-08-06 15:00 Bournemouth NA
# Home_xG Away AwayGoals Away_xG Attendance Venue Referee Notes
# 1 1.2 Arsenal NA 1.0 25286 Selhurst Park Anthony Taylor NA
# 2 1.2 Liverpool NA 1.2 22207 Craven Cottage Andy Madley NA
# 3 1.5 Southampton NA 0.5 61732 Tottenham Hotspur Stadium Andre Marriner NA
# 4 1.7 Nott'ham Forest NA 0.3 52245 St James' Park Simon Hooper NA
# 5 0.8 Wolves NA 1.3 36347 Elland Road Robert Jones NA
# 6 0.6 Aston Villa NA 0.7 11013 Vitality Stadium Peter Bankes NA
# MatchURL
# 1 https://fbref.com/en/matches/e62f6e78/Crystal-Palace-Arsenal-August-5-2022-Premier-League
# 2 https://fbref.com/en/matches/6713c1dc/Fulham-Liverpool-August-6-2022-Premier-League
# 3 https://fbref.com/en/matches/09d8a999/Tottenham-Hotspur-Southampton-August-6-2022-Premier-League
# 4 https://fbref.com/en/matches/1ac96eb4/Newcastle-United-Nottingham-Forest-August-6-2022-Premier-League
# 5 https://fbref.com/en/matches/82702941/Leeds-United-Wolverhampton-Wanderers-August-6-2022-Premier-League
# 6 https://fbref.com/en/matches/877e3193/Bournemouth-Aston-Villa-August-6-2022-Premier-League
Hmm this is weird - I'm not seeing this when I run the function.
I will leave this issue open for now in case anyone else experiences this. In the meantime, you can always use load_match_results()
:
loaded_results <- worldfootballR::load_match_results(country = "ENG", gender = "M", season_end_year = 2023, tier = "1st")
i'me seeing same error with the code fragment above, same version of the package:
packageVersion("worldfootballR") #‘0.6.4.4’
For me this started in the last couple of days, same code working fine before then
as I understand it from the docs worldfootballR::load_match_results
pulls from a cached version of the data off your GitHub? How often is that updated? I see the English National League results from Tuesday 26th September 2023 on the fibref site, but when I pull that league via the above method, they're still missing as of today (a couple of days later)
worldfootballR::load_match_results(country = "ENG", tier = "5th", gender = "M",2024)
Interesting... I still can't recreate this issue.. @tonyelhabr, are you able to return correct results as expected also?
In regards to your question @fine-lemur, the match results are updated based on the following CRON schedule (UTC):
on:
schedule:
- cron: "15 17 * 1-5,8-12 0,2,4"
So Sundays, Tuesdays and Thursdays.
I am using a Mac, I wonder if that is causing the issue? Sometimes I get file encoding issues that I have to manually address in the code.
Closed by mistake
@dorronsoro1 i know you have a pretty recent version of the package, but can you re-install with the latest version (i.e. using remotes::install_github("JaseZiv/worldfootballR")
) and try again? like @JaseZiv, i don't have any issue with NA
s for the goals fields
library(worldfootballR)
packageVersion("worldfootballR")
#> [1] '0.6.4.8'
results <- fb_match_results(country = "ENG", gender = "M", season_end_year = 2023, tier = "1st")
dplyr::glimpse(results)
#> Rows: 380
#> Columns: 20
#> $ Competition_Name <chr> "Premier League", "Premier League", "Premier League",…
#> $ Gender <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M"…
#> $ Country <chr> "ENG", "ENG", "ENG", "ENG", "ENG", "ENG", "ENG", "ENG…
#> $ Season_End_Year <int> 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023,…
#> $ Round <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ Wk <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "10…
#> $ Day <chr> "Fri", "Sat", "Sat", "Sat", "Sat", "Sat", "Sat", "Sun…
#> $ Date <date> 2022-08-05, 2022-08-06, 2022-08-06, 2022-08-06, 2022…
#> $ Time <chr> "20:00", "12:30", "15:00", "15:00", "15:00", "15:00",…
#> $ Home <chr> "Crystal Palace", "Fulham", "Tottenham", "Newcastle U…
#> $ HomeGoals <dbl> 0, 2, 4, 2, 2, 2, 0, 2, 1, 0, 2, 5, 4, 3, 0, 3, 2, 3,…
#> $ Home_xG <dbl> 1.2, 1.2, 1.5, 1.7, 0.8, 0.6, 0.7, 0.6, 1.4, 0.5, 1.1…
#> $ Away <chr> "Arsenal", "Liverpool", "Southampton", "Nott'ham Fore…
#> $ AwayGoals <dbl> 2, 2, 1, 0, 1, 0, 1, 2, 2, 2, 1, 1, 0, 0, 1, 1, 1, 2,…
#> $ Away_xG <dbl> 1.0, 1.2, 0.5, 0.3, 1.3, 0.7, 1.5, 0.8, 1.5, 2.2, 1.0…
#> $ Attendance <dbl> 25286, 22207, 61732, 52245, 36347, 11013, 39254, 3179…
#> $ Venue <chr> "Selhurst Park", "Craven Cottage", "Tottenham Hotspur…
#> $ Referee <chr> "Anthony Taylor", "Andy Madley", "Andre Marriner", "S…
#> $ Notes <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ MatchURL <chr> "https://fbref.com/en/matches/e62f6e78/Crystal-Palace…
I am using a Mac, I wonder if that is causing the issue? Sometimes I get file encoding issues that I have to manually address in the code.
I'm also using a Mac
I'm on a Mac also -- Sonora on M1 Pro
results <- fb_match_results(country = "ENG", gender = "M", season_end_year = 2023, tier = "1st") dplyr::glimpse(results)
@dorronsoro1 i know you have a pretty recent version of the package, but can you re-install with the latest version (i.e. using
remotes::install_github("JaseZiv/worldfootballR")
) and try again? like @JaseZiv, i don't have any issue withNA
s for the goals fieldslibrary(worldfootballR) packageVersion("worldfootballR") #> [1] '0.6.4.8' results <- fb_match_results(country = "ENG", gender = "M", season_end_year = 2023, tier = "1st") dplyr::glimpse(results) #> Rows: 380 #> Columns: 20 #> $ Competition_Name <chr> "Premier League", "Premier League", "Premier League",… #> $ Gender <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M"… #> $ Country <chr> "ENG", "ENG", "ENG", "ENG", "ENG", "ENG", "ENG", "ENG… #> $ Season_End_Year <int> 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023,… #> $ Round <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… #> $ Wk <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "10… #> $ Day <chr> "Fri", "Sat", "Sat", "Sat", "Sat", "Sat", "Sat", "Sun… #> $ Date <date> 2022-08-05, 2022-08-06, 2022-08-06, 2022-08-06, 2022… #> $ Time <chr> "20:00", "12:30", "15:00", "15:00", "15:00", "15:00",… #> $ Home <chr> "Crystal Palace", "Fulham", "Tottenham", "Newcastle U… #> $ HomeGoals <dbl> 0, 2, 4, 2, 2, 2, 0, 2, 1, 0, 2, 5, 4, 3, 0, 3, 2, 3,… #> $ Home_xG <dbl> 1.2, 1.2, 1.5, 1.7, 0.8, 0.6, 0.7, 0.6, 1.4, 0.5, 1.1… #> $ Away <chr> "Arsenal", "Liverpool", "Southampton", "Nott'ham Fore… #> $ AwayGoals <dbl> 2, 2, 1, 0, 1, 0, 1, 2, 2, 2, 1, 1, 0, 0, 1, 1, 1, 2,… #> $ Away_xG <dbl> 1.0, 1.2, 0.5, 0.3, 1.3, 0.7, 1.5, 0.8, 1.5, 2.2, 1.0… #> $ Attendance <dbl> 25286, 22207, 61732, 52245, 36347, 11013, 39254, 3179… #> $ Venue <chr> "Selhurst Park", "Craven Cottage", "Tottenham Hotspur… #> $ Referee <chr> "Anthony Taylor", "Andy Madley", "Andre Marriner", "S… #> $ Notes <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… #> $ MatchURL <chr> "https://fbref.com/en/matches/e62f6e78/Crystal-Palace…
Still having NAs returned for goals see screenshot below
i'm guessing this has something to do with character encodings being different on different systems. in this case, the "em-dash" –
can be probelemmatic here.
can you run this code and let me know what the output looks like? i've printed mine.
example_score <- '2–1'
## current approach
iconv(example_score, 'utf-8', 'ascii', sub=' ')
#> [1] "2 1"
## potential alternative approach
gsub('–', ' ', example_score)
#> [1] "2 1"
i'm guessing this has something to do with character encodings being different on different systems. in this case, the "em-dash"
–
can be probelemmatic here.can you run this code and let me know what the output looks like? i've printed mine.
example_score <- '2–1' ## current approach iconv(example_score, 'utf-8', 'ascii', sub=' ') #> [1] "2 1" ## potential alternative approach gsub('–', ' ', example_score) #> [1] "2 1"
When initially writing that function, I did have reservations about handling it by using iconv(example_score, 'utf-8', 'ascii', sub=' ')
but found that by explicitly including "em-dash" –
in the .R file was causing my RStudio session to keep borking...
Below is the output when I run it:
> example_score <- '2–1'
> iconv(example_score, 'utf-8', 'ascii', sub=' ')
[1] "2-1"
> gsub('–', ' ', example_score)
[1] "2 1"
i'm guessing this has something to do with character encodings being different on different systems. in this case, the "em-dash"
–
can be probelemmatic here.
can you run this code and let me know what the output looks like? i've printed mine.example_score <- '2–1' ## current approach iconv(example_score, 'utf-8', 'ascii', sub=' ') #> [1] "2 1" ## potential alternative approach gsub('–', ' ', example_score) #> [1] "2 1"When initially writing that function, I did have reservations about handling it by using
iconv(example_score, 'utf-8', 'ascii', sub=' ')
but found that by explicitly including "em-dash"–
in the .R file was causing my RStudio session to keep borking...
I have experience this issue once. I realized my default encoding in RStudio was not UTF-8, so I fixed that.
Below is the output when I run it:
> example_score <- '2–1' > iconv(example_score, 'utf-8', 'ascii', sub=' ') [1] "2-1" > gsub('–', ' ', example_score) [1] "2 1"
Ah, so it seems the gsub()
solution might be worth using going forward. As one last check, @dorronsoro1 I'm curious to know what you see when you run this.
Sys.getenv("LC_COLLATE")
#> [1] ""
This is what I get when I run that command:
> Sys.getenv("LC_COLLATE")
[1] ""
I also tried changing the default encoding as mentioned above, saved a new script, and ran the lines previous mentioned, and still got NAs.
Sys.getenv("LC_COLLATE")
[1] “"
On my Mac also (running R in Intellij)
Ok so at this point I don't know exactly what the underlying issue is. My suspicion is that it has something to do with character encodings.
Anyways, the fix in #340 should resolve things. You can try it out for yourself right now (before the PR is merged) by installing the package with remotes::install_github("JaseZiv/worldfootballR@fix-na-goals"
.