data-cleaning/validate

Question - "threw error" <> 0 but no failures in plot

jonysafi opened this issue · 1 comments

Hi,
First, great job on the package. I have been testing it the whole week and want to adopt it for a lot of things I am doing.

I have the following scenario:

I am building a set of rules to validate a dataframe (source is csv file).

The dataframe structure is:

> str(df)
'data.frame':	585 obs. of  6 variables:
 $ dt_collect    : POSIXct, format: "2019-09-20" "2019-09-21" "2019-09-22" "2019-09-23" ...
 $ attrib	     : chr  "A" "A" "A" "A" ...
 $ num1          : num  0.99 1 1.01 0.99 0.99 1 1 0.96 0.94 0.95 ...
 $ num2          : num  21.9 22.4 22.7 22.8 22.9 ...
 $ num3          : num  2.22 2.3 2.2 2.19 2.15 2.56 2.75 2.96 3 2.97 ...
 $ num4          : num  22.1 22.6 22.8 22.9 22.9 ...

The rules look like:
rules <- validator(is_unique(attrib, dt_collect),
                   length(unique(dt_collect)) >= 365,
                   in_range(dt_collect, min = min(dt_collect), max = max(dt_collect)),
                   is.POSIXct(dt_collect),
                   is.character(attrib),
                   is.numeric(num1),
                   is.numeric(num2),
                   is.numeric(num3),
                   is.numeric(num4)) 

When I run the confront:

out <- confront(df, rules)

I see the below:

Object of class 'validation'
Call:
    confront(dat = df, x = rules)

Rules confronted: 9
   With fails   : 0
   With missings: 0
   Threw warning: 0
   **Threw error  : 1**

The df_out <- as.data.frame(summary(out)) shows the below:

  name items passes fails nNA error warning                                                         expression
1   V1   585    585     0   0 FALSE   FALSE                                       is_unique(attrib, dt_collect)
2   V2     1      1     0   0 FALSE   FALSE                                  length(unique(dt_collect)) >= 365
3   V3   585    585     0   0 FALSE   FALSE  in_range(dt_collect, min = min(dt_collect), max = max(dt_collect))
4   V4     0      0     0   0  **TRUE**   FALSE                                              is.POSIXct(dt_collect)
5   V5     1      1     0   0 FALSE   FALSE                                        		   is.character(attrib)
6   V6     1      1     0   0 FALSE   FALSE                                                    is.numeric(num1)
7   V7     1      1     0   0 FALSE   FALSE                                                    is.numeric(num2)
8   V8     1      1     0   0 FALSE   FALSE                                                    is.numeric(num3)
9   V9     1      1     0   0 FALSE   FALSE                                                    is.numeric(num4)				

The plot(out) comes in green.
My question is: What threw an error? is the is.POSIXct? if so (where TRUE is mentioned) then what is the error and why it is not reported clearly?

The dt_collect comes as chr from the csv file, I can use as.Date or as.POSIXct to work with is. However, both is.Date and is.POSIXct in the rules shows the same behavior mentioned above.

Thank you,

Hi, please try

errors(out)

to get the explicit error message. You can also get the error thrown immediately, by doing

confront(df, rules, raise="all")