eddelbuettel/anytime

Warn for NA's caused by `anydate()`

asadow opened this issue ยท 12 comments

asadow commented

Can we get a warning for any NA's caused by anydate() similar to lubridate::parse_date()?

But isn't the presence of NA the warning already? You can always wrap a function around from your that checks with any(is.na(...)) over the result set.

asadow commented

You might not notice the NA without a warning. I've caught a lot of mistakes from lubridate's warnings.

Maybe writing a local anytime_safe(....) that calls anytime(...) and checks the result for NAs and then proceeds with a warning as you desire. It still strikes me as something I would not want to add to anytime directly.

I do grant you that as.Date() also hollers:

> as.Date("2023-09-08")
[1] "2023-09-08"
> as.Date("2023-19-08")
Error in charToDate(x) : 
  character string is not in a standard unambiguous format
> 

The problem is that the absence of NA is not a guarantee the right happened. The ambiguity with non-ISO inputs is one example:

> anytime::anydate("12/13/2023")
[1] "2023-12-13"
> anytime::anydate("13/12/2023")
[1] NA
> 

Maybe the first one was 'wrong' too because it simply silently converted under the wrong guess of format? It's a hard problem.

asadow commented

What if the first one gave a message as to how the format was guessed, much like how read_csv() gives messages on which column types were guessed?

A message like that and a warning when an input becomes NA won't guarantee the right thing happened, but they will help show what's happening.

Maybe I am slowly coming around to your view. I'll need a moment to look at the code. There are frankly too many entry and exit points but maybe I can limit it to 'what happens in anytime_cpp()'

  • respect an options() at startup to no do this and act as before
  • before returning check for any NAs in the result set and warn if any found

One problem is that anytime() and utctime() and anydate() and utcdate() do not have a common 'single' core.

So maybe the simple safe_anytime() wrapper is easiest.

You are suggesting behavior in terms of tidyverse packages and functions (reference to lubridate, reference to read_csv) and I should get off my chest that I do not consider these to be normative. Only Base R is.

The rest are opionionated add-ons, including my packages. So if lubridate does what you want, but all means use and enjoy it! It is a package that appears to make a lot of people happy enough. I wrote anytime for myself and it suits me (and seemingly a few other people too).

asadow commented

This is more of a humble suggestion than a personal request/issue. I agree a simple wrapper is a great solution instead of asking for features. I am just thinking of future users who might not be comfortable with wrappers.

I admit I'm a biased fan of tidyverse!

You do raise a fair point, and it would be a little easier if we had one input and output. Consider for example that I also dispatch (via S3) to as.Date() or as.POSIXct() if I already get that as input. (Arguably a corner case as the staple is conversion from character.) Consider now we have proper input in that class, but it contains a NA. Similar to

> as.Date(Sys.Date()+c(0:2,NA,4))        
[1] "2023-09-08" "2023-09-09" "2023-09-10" NA           "2023-09-12"            
>  

If we add a check to the C++ parts of anytime() then our anydate() would do the same on that input, not warn, yet return a NA. That is .... kinda not great as it remains silent if we add a NA test to the from-character conversion.

And adding it everywhere seems ... heavy. So tough one.

Not grealy in love with this yet but we now warn when a NA is seen:

edd@rob:~/git/anytime(feature/na_warning)$ Rscript -e 'library(anytime); anydate(c("13/13/2023", format(Sys.Date()), "44/44/4444"))'
[1] NA           "2023-09-09" NA          
Warning messages:
1: In anytime_cpp(x = x, tz = tz, asUTC = asUTC, asDate = TRUE, useR = useR,  :
  Input conversion of '13/13/2023' resulted in 'NA' results.
2: In anytime_cpp(x = x, tz = tz, asUTC = asUTC, asDate = TRUE, useR = useR,  :
  Input conversion of '44/44/4444' resulted in 'NA' results.
edd@rob:~/git/anytime(feature/na_warning)$ 
asadow commented

That looks great. It should help researchers (like my former self) with manually entered dates. The only feedback I can add is that avoiding the repetition of In anytime_cpp.. would make it even more user-friendly. I can't comment on your previous post, but it seems you solved it. Thank you, Dirk.

Yeppers. I showed two occassions in my example for two reasons:

  • it can be useful to see a NA for each failed conversion
  • it can be tedious to see several

and I was leaning also towards to latter so I have to think about about just added a state variable to ensure we print at most one.

(Also the overall format is not "great" but it is the right thing to do this via the warnings() interface as it has extra handles for extra features.)

Changed to this:

edd@rob:~/git/anytime(feature/na_warning)$ Rscript -e 'library(anytime); anydate(c("13/13/2023", format(Sys.Date()), "44/44/4444"))'
[1] NA           "2023-09-13" NA          
Warning message:
In anytime_cpp(x = x, tz = tz, asUTC = asUTC, asDate = TRUE, useR = useR,  :
  Input conversion of '13/13/2023' resulted in 'NA' results.
edd@rob:~/git/anytime(feature/na_warning)$ 

So it will warn once only. Probably good enough.