Updated england_current() function
JoGall opened this issue · 8 comments
Just a few changes to this function as NAs were being returned. Not long left for this season now but function can easily be updated next season (i.e. update .csv links and change 'Season' to 2017).
'Date' as date class instead of character; 'division' and 'tier' changed to extract numeric from string and prevent NAs being returned (e.g. E0 -> 1); call to teamnames
dataframe to replace team name variants with main name used in england
dataframe (e.g. "Man City" -> "Manchester City").
england_current <- function(){
#*update each season*
df1 <- rbind(read.csv("http://www.football-data.co.uk/mmz4281/1617/E0.csv"),
read.csv("http://www.football-data.co.uk/mmz4281/1617/E1.csv"),
read.csv("http://www.football-data.co.uk/mmz4281/1617/E2.csv"),
read.csv("http://www.football-data.co.uk/mmz4281/1617/E3.csv")
)
df2 <- data.frame("Date" = as.Date(df1$Date, "%d/%m/%y"),
"Season" = rep(2016, nrow(df1)), #*update each season*
"home" = df1$HomeTeam,
"visitor" = df1$AwayTeam,
"FT" = paste0(df1$FTHG, "-", df1$FTAG),
"hgoal" = df1$FTHG,
"vgoal" = df1$FTAG,
"division" = as.numeric(sapply(strsplit(df1$Div, ""), "[[", 2)) + 1, #convert division names to numeric (e.g. "E0" ->"1")
"tier" = as.numeric(sapply(strsplit(df1$Div, ""), "[[", 2)) + 1,
"totgoal" = df1$FTHG + df1$FTAG,
"goaldif" = df1$FTHG - df1$FTAG,
"result" = df1$FTR
)
#replace any new team name variants with pre-existing names (e.g. "Man City" -> "Manchester City")
df2$home <- teamnames$name[match(df2$home, teamnames$name_other)]
df2$visitor <- teamnames$name[match(df2$visitor, teamnames$name_other)]
return(df2)
}
The current version of this function on GitHub doesn't appear to return NA
s - as far as I can see all the improvements were already in that function except for ensuring that the Date is a Date class .. or did I miss something?
I think the issue of what to do with this function in the off-season is a puzzle. Probably best to just leave it as is and on the day of the new season change it??? Or maybe add a warning?
Ah something went wrong on my end, I didn't realise the current GitHub version had the call to the teamnames dataframe already.
I'm still getting NAs returned when I run the latest version though, I think because the function tries to convert the division and tiers to numeric directly ("division" = as.numeric(df$Div)
, "tier" = as.numeric(df$Div)
) but the variable 'Div' also contains a character (i.e. E0, E1...). Think the numeric value has to be extracted first with strsplit, gsub, etc...
interesting - I can't repeat that error, but I will look into it. I'll be overhauling the other functions tonight also, so hopefully can track down that error.
ok the reason this should work is that E0,E1,E2,E3 are brought in as factors and then the as.numeric
just reads the level of the factor as a number. To ensure it will work, I will just wrap the variable in factor
- that ought to do it.
Ah I've just realised why then, I had options(stringsAsFactors = FALSE)
in my .Rprofile! I'm going to remove the line from my .Rprofile to make sure my code is portable in future but it's probably a good idea to explicitly make it a factor in the function for others that might have the option set.
Also, I hadn't thought of what to do with this function during the off-season... Could we maybe check whether the england
dataframe is already up-to-date before running? Something like:
england_current <- function(){
df <- rbind(read.csv("http://www.football-data.co.uk/mmz4281/1617/E0.csv"),
read.csv("http://www.football-data.co.uk/mmz4281/1617/E1.csv"),
read.csv("http://www.football-data.co.uk/mmz4281/1617/E2.csv"),
read.csv("http://www.football-data.co.uk/mmz4281/1617/E3.csv")
)
if(identical(max(as.Date(df$Date, "%d/%m/%y")), max(england$Date) )) {
#message about being up to date
}
else {
#rest of function
}
}
I think lots have that in their .Rprofile - therefore it's a good job to make sure that the code is robust to that.
Good idea for the function date check - I will implement that.