tidyverse/vroom

Feature request: Column-level na values via collectors

khusmann opened this issue · 0 comments

Right now, NA values are specified globally via na arg in the read_*() family of functions. Sometimes I want to supply NA values for specific columns, rather than the entire data set. A nice way to do this could be to add an na arg to all of the collector types to specify column-level missing values.

Column-level missing values come up frequently in survey data. Here are two examples:

Example 1:

What is your current stress level?
a. Low (LOW)
b. Moderate (MODERATE)
c. High (HIGH)
d. I don’t know (DONT_KNOW)
e. I don’t understand the question (DONT_UNDERSTAND)

I'd like to be able to create a col_factor type that reads the last two responses as NA as follows:

col_factor(levels = c("LOW", "MODERATE", "HIGH"), ordered = TRUE, na = c("DONT_KNOW", "DONT_UNDERSTAND"))

Example 2:

An item that records the individual's height as a double, but can have the following missing values: "ABSENT", "RULER_BROKE"

col_double(na = c("ABSENT", "RULER_BROKE"))