data-cleaning/validate

or not working properly

adewhite81 opened this issue · 3 comments

Hello all,

I have been studying for a short time the package. I need to validate rules with logical operators. In my simple script, I've tested AND and OR operators. The first works correctly while the second seems to have some problems, or maybe I'm doing something wrong.

Why is the property error of summary true using the OR operator? Please see the example below.

Here is my snipped of code:

df2 <- data.frame(A=c(NA,2,NA,4,5), B=c(10,NA,30,40,50), C=c(NA,200,300,4,500))


rules <- validator( 
is.na(A) & is.na(C), #error false
is.na(B)| is.na(C),#error false
  is.na(A) | is.na(C),#error false
  (**A>=B | c<B ),#error true
  (A>=B || c<B ),#error true
  !(!(A>=B) &&  !(c<=B) ) #error true
  , !(!(A>=B) & !(c<=B) ))#error true**


out <- confront(df2, rules)
ve<-summary(out)
print(out)

Screenshot 2023-09-06 alle 15 14 50

Hi there,

you can use

errors(out)

to see the error messages. Looking at your code, I see the use of several column names in your rules that do not exist. Specifically: in V4 the variable **A does not exist in your dataset. In V5 to V7 the variable c does not exist in your dataset (remember that R is case-sensitive, so c and C are different variable names).

HTH,
Mark

you are using invalid syntax. Avoid && and ||, and use & and | in stead. Also, there is some unnecessary bracketing there. The following seems to work.

> rules <- validator(
    is.na(A) & is.na(C), 
    is.na(B)| is.na(C),
    is.na(A) | is.na(C),
    A>=B | C<B ,
    A>=B | C<B ,
    !(!(A>=B) &  !(C<=B) ) 
   , !(!(A>=B) & !(C<=B) )
 )
> df2 <- data.frame(A=c(NA,2,NA,4,5), B=c(10,NA,30,40,50), C=c(NA,200,300,4,500))
> out <- confront(df2, rules)
> out
Object of class 'validation'
Call:
    confront(dat = df2, x = rules)

Rules confronted: 7
   With fails   : 7
   With missings: 4
   Threw warning: 0
   Threw error  : 0