Compare values of two columns based on complex conditions
josmos opened this issue · 3 comments
I have a rather complex function comparing (possibly partial) date strings:
compare_partial_dates <- function(date1, date2, missing_value_pattern = "nk", sep = ".") {
no_y_pat <- paste(missing_value_pattern, missing_value_pattern, missing_value_pattern, sep = sep) # nk.nk.nk
no_m_pat <- paste(missing_value_pattern, missing_value_pattern, "", sep = sep) # nk.%m.%Y
no_d_pat <- paste(missing_value_pattern, "", sep = sep) # nk.nk.%Y
if (is.na(date1) || is.na(date2)) {
# missing date: no comparison possible
return(TRUE)
} else if (str_starts(date1, no_y_pat) == TRUE || str_starts(date2, no_y_pat) == TRUE) {
# nk.nk.nk.: no comparison possible
return(TRUE)
} else if (str_starts(date1, no_m_pat) == TRUE || str_starts(date2, no_m_pat) == TRUE) {
# missing month: set both dates to 01.01.%Y
date1 <- paste("01", "01", substr(date1, nchar(date1) - 3, nchar(date1)), sep = ".")
date2 <- paste("01", "01", substr(date2, nchar(date2) - 3, nchar(date2)), sep = ".")
} else if (str_starts(date1, no_d_pat) == TRUE || str_starts(date2, no_d_pat)) {
# missing day: set both dates to 01.%m.%Y
date1 <- paste("01", substr(date1, nchar(date1) - 6, nchar(date1)), sep = ".")
date2 <- paste("01", substr(date2, nchar(date2) - 6, nchar(date2)), sep = ".")
}
# convert to numeric date
date1 <- as.Date(strptime(date1, format = "%d.%m.%Y", tz = "UTC"))
date2 <- as.Date(strptime(date2, format = "%d.%m.%Y", tz = "UTC"))
# print(paste(date1, operator, date2, sep = " "))
# compare the numeric date values:
return(date1 <= date2)
}
I have a lot of date-columns to compare. Making rules with simple expressions for each column combination would be a mess.
Is it possible to make this comparison with validate using a function like this (or similar one)? How could this be implemented?
Hi There, for any function f(...)
that returns a logical vector you can create a rule like this
rules <- validator( f(x,z) == TRUE)
if you need to compare, say variables x and y to z, than you could use a variable group like so:
rules <- validator(
G := var_group(x,y)
, f(G,z)
)
The other option is to generate the rules in a file and read them later.
template <- "f(%s,z)"
txt <- paste(sprintf(template, some_vector_of_names), collapse="\n")
write(txt, file="rules.R")
rules <- validator(.file="rules.R")
I have a similar issue, in a previous version I was able to use the inline function A %==% B within rules, this seems to no longer be the case. Do I have to rewrite all rules that used this function to something like eq(A,B) == TRUE?
`%==%`<- function(e1,e2){
if(length(e1) == length(e2)){
isEqual <- e1 == e2 | (is.na(e1)) & (is.na(e2))
isEqual[is.na(isEqual)] <- FALSE
return(isEqual)
}
else{
return(FALSE)
}
Thanks