Draft for discussion.
This issue was closed : tidyverse/dplyr#6547
Is this extension package useful ?
{populate} provides stricter wrappers around dplyr::mutate()
:
populate()
makes sure the input’s ptype is preserved- no column type change
- no column addition
- no column deletion (
mutate()
can delete withNULL
) - cast by default, assert optionally
collate()
only creates new columns
Do we need more or should we keep it simple?
abrogate()
to remove columns ?sublimate()
likepopulate()
but allow ptype change (doesn’t allow new col creation) ?summarize()
variants ?
Is it better to use a subclass of data frame (as in linked github issue)
so we can make safe versions of everything including functions like
count()
, unnest()
, joins …
We’d lose autocomplete of new args, maybe more confusing than having new verbs ?
You can install the development version of populate like so:
devtools::install_github("cynkra/populate")
library(populate)
data <- tibble::tibble(
a = letters[1:2],
b = c(1,2),
c = factor(letters[1:2]),
d = as.Date(c("2022-01-01", "2022-01-02")),
e = vctrs::list_of(cars)
)
# can't create a column if it exists
collate(data, a = 1)
#> Error in `collate()`:
#> ! `collate()` can't override existing columns.
#> ℹ Found in data: `a`.
#> ℹ Do you need `populate()`?
# but we can create new columns
collate(data, ee = 1)
#> # A tibble: 2 × 6
#> a b c d e ee
#> <chr> <dbl> <fct> <date> <list<df[,2]>> <dbl>
#> 1 a 1 a 2022-01-01 [50 × 2] 1
#> 2 b 2 b 2022-01-02 [50 × 2] 1
# we can't create a new column with populate()
populate(data, ee = 1)
#> Error in `populate()`:
#> ! `populate()` can't create new columns.
#> ✖ Not found in data: `ee`.
#> ℹ Did you mispell `e`?
#> ℹ Do you need `collate()`?
# can't cast character to double
populate(data, a = 1)
#> Error in `populate()`:
#> ! Can't `populate()` the data.
#> Caused by error in `dplyr::mutate()`:
#> ! Problem while computing `a = vctrs::vec_cast(1, vctrs::vec_ptype(a))`.
#> Caused by error:
#> ! Can't convert `1` <double> to <character>.
# casting integer to double
populate(data, b = 3:4)
#> # A tibble: 2 × 5
#> a b c d e
#> <chr> <dbl> <fct> <date> <list<df[,2]>>
#> 1 a 3 a 2022-01-01 [50 × 2]
#> 2 b 4 b 2022-01-02 [50 × 2]
# doesn't work if `.strict` is `TRUE`
populate(data, b = 3:4, .strict = TRUE)
#> Error in `populate()`:
#> ! Can't `populate()` the data.
#> Caused by error in `dplyr::mutate()`:
#> ! Problem while computing `b = vctrs::vec_assert(3:4,
#> vctrs::vec_ptype(b))`.
#> Caused by error:
#> ! `3:4` must be a vector with type <double>.
#> Instead, it has type <integer>.
# casting character to factor with allowed levels
populate(data, c = c("b", "b"))
#> # A tibble: 2 × 5
#> a b c d e
#> <chr> <dbl> <fct> <date> <list<df[,2]>>
#> 1 a 1 b 2022-01-01 [50 × 2]
#> 2 b 2 b 2022-01-02 [50 × 2]
# can't cast because wrong levels
populate(data, c = c("b", "d"))
#> Error in `populate()`:
#> ! Can't `populate()` the data.
#> Caused by error in `dplyr::mutate()`:
#> ! Problem while computing `c = vctrs::vec_cast(c("b", "d"),
#> vctrs::vec_ptype(c))`.
#> Caused by error:
#> ! Can't convert from `c("b", "d")` <character> to <factor<38051>> due to loss of generality.
#> • Locations: 2
# datetimes are casted to date
populate(data, d = lubridate::as_datetime(c("2022-01-01", "2022-01-02")))
#> # A tibble: 2 × 5
#> a b c d e
#> <chr> <dbl> <fct> <date> <list<df[,2]>>
#> 1 a 1 a 2022-01-01 [50 × 2]
#> 2 b 2 b 2022-01-02 [50 × 2]
# characters can't be casted to date
populate(data, d = c("2022-01-01", "2022-01-02"))
#> Error in `populate()`:
#> ! Can't `populate()` the data.
#> Caused by error in `dplyr::mutate()`:
#> ! Problem while computing `d = vctrs::vec_cast(c("2022-01-01",
#> "2022-01-02"), vctrs::vec_ptype(d))`.
#> Caused by error:
#> ! Can't convert `c("2022-01-01", "2022-01-02")` <character> to <date>.
# using list_of allowed us to prevent corrupting our data silently
populate(data, e = list(iris))
#> Error in `populate()`:
#> ! Can't `populate()` the data.
#> Caused by error in `dplyr::mutate()`:
#> ! Problem while computing `e = vctrs::vec_cast(list(iris),
#> vctrs::vec_ptype(e))`.
#> Caused by error:
#> ! Can't convert `list(iris)` <list> to <list_of<
#> data.frame<
#> speed: double
#> dist : double
#> >
#> >>.
# and we don't have to bother with list_of anymore if we feed the right format
populate(data, e = list(head(cars)))
#> Error in `populate()`:
#> ! Can't `populate()` the data.
#> Caused by error in `dplyr::mutate()`:
#> ! Problem while computing `e = vctrs::vec_cast(list(head(cars)),
#> vctrs::vec_ptype(e))`.
#> Caused by error:
#> ! Can't convert `list(head(cars))` <list> to <list_of<
#> data.frame<
#> speed: double
#> dist : double
#> >
#> >>.