tidyverse/readxl

readxl fails with truncated dates

stitam opened this issue · 0 comments

Related to #462, #716, #737.

In my data set the dates are sometimes truncated:

df <- data.frame(Sample = c("S1","S2","S3"), Date = c("2020-01-01","2023-05","2015"))

When I want to import this data frame into R using readxl, I seem to have two options: either import the Date column as text, in this case "2020-01-01" will become 43831, or import it as date, in this case "2023-05" will become NA and "2015" will become "1905-07-07" without any warnings.

One way to solve this is to import the column as text and then use a custom function to convert back to the original, using a function like this (though it is strange that origin required for accurate conversion is not 1970-01-01):

recover_dates <- function(x) {
  foo <- function(A) {
    A_num <- suppressWarnings(as.numeric(A))
    if (!is.na(A_num) && A_num > 10000) {
      return(as.character(as.Date(A_num, origin = "1899-12-30")))
    } else {
      return(A)
    }
  }
  unname(sapply(x, foo))
}

recover_dates(df$Date)

However, it would be even better if col_types = "text" would just read in the Date column as is, without any conversions.

(You might wonder how I can use these truncated dates in any analysis: I usually simulate a valid date within the boundaries of the information I have, but for this, I need to read in the values as is.)

EDIT: if I manually export a sheet of an xlsx as csv I can then import it using read.csv() and it works as intended.