JaseZiv/worldfootballR

`tm_player_transfer_history()` failing due to not being available in the HTML of transfermarkt

Closed this issue · 3 comments

Without using some form of browser automation, player transfer histories are no longer able to be scraped by tm_player_transfer_history() in its current form.

Will open this issue and try to incorporate the work @tonyelhabr did using chromote to obtain certain FBREF data points.

Without using some form of browser automation, player transfer histories are no longer able to be scraped by tm_player_transfer_history() in its current form.

Will open this issue and try to incorporate the work @tonyelhabr did using chromote to obtain certain FBREF data points.

I tried out the chromote approach and found that I'm getting blocked upon loading a player URL.

session <- worldfootballR:::worldfootballr_chromote_session("https://www.transfermarkt.com/cristiano-ronaldo/profil/spieler/8198")
session$session$view()

image

I did find that there is an API call that we can make to get some of the transfer history elements, although I'm not sure how we'll get some things like from_country and to_country.

library(worldfootballR)
library(httr)
#> Warning: package 'httr' was built under R version 4.2.3
headers = c(
  `User-Agent` = getOption("worldfootballR.agent")
)

res <- httr::GET(
  url = "https://www.transfermarkt.com/ceapi/transferHistory/list/8198",
  httr::add_headers(.headers = headers)
)

cont <- content(res)
transfers <- cont$transfers
str(transfers[1:2], max.level = 2)
#> List of 2
#>  $ :List of 12
#>   ..$ url               : chr "/cristiano-ronaldo/transfers/spieler/8198/transfer_id/4197140"
#>   ..$ from              :List of 7
#>   ..$ to                :List of 7
#>   ..$ futureTransfer    : int 0
#>   ..$ date              : chr "Jan 1, 2023"
#>   ..$ dateUnformatted   : chr "2023-01-01"
#>   ..$ upcoming          : logi FALSE
#>   ..$ season            : chr "22/23"
#>   ..$ marketValue       : chr "€20.00m"
#>   ..$ fee               : chr "-"
#>   ..$ showUpcomingHeader: logi FALSE
#>   ..$ showResetHeader   : logi FALSE
#>  $ :List of 12
#>   ..$ url               : chr "/cristiano-ronaldo/transfers/spieler/8198/transfer_id/4152208"
#>   ..$ from              :List of 7
#>   ..$ to                :List of 7
#>   ..$ futureTransfer    : int 0
#>   ..$ date              : chr "Nov 22, 2022"
#>   ..$ dateUnformatted   : chr "2022-11-22"
#>   ..$ upcoming          : logi FALSE
#>   ..$ season            : chr "22/23"
#>   ..$ marketValue       : chr "€20.00m"
#>   ..$ fee               : chr "-"
#>   ..$ showUpcomingHeader: logi FALSE
#>   ..$ showResetHeader   : logi FALSE

Upon a GitHub search, I found that a python package made a similar fix in the past 2 weeks. Here is their code for scraping history.

Oh, so I think we can still get the "extra info" from server-side loaded data. So we may actually be capable of returning the same data from the function as before.