`group_by` + equality `filter` on column with `NA` values introduces `NA` rows
geoffreyyip opened this issue · 2 comments
geoffreyyip commented
When equality filtering on a column with NA
values, doing a group_by
operation prior to filtering can introduce rows with NA
values. This is unexpected since tidyverse
does not introduce NA
values
Note: I use str()
to print the output, since tidytable currently doesn't print in reprex()
See Issue #810
library(tidytable, tidyverse)
#>
#> Attaching package: 'tidytable'
#> The following objects are masked from 'package:stats':
#>
#> dt, filter, lag
#> The following object is masked from 'package:base':
#>
#> %in%
#------------------------------------------------------------------------------
# The tidyverse version GOOD: No NAs introduced
#------------------------------------------------------------------------------
t1 <- dplyr::tribble(
~id, ~status,
1, 1,
4, 0,
7, NA,
)
output1 <- t1 |>
dplyr::group_by(id) |>
dplyr::filter(status == 1)
str(output1)
#> gropd_df [1 × 2] (S3: grouped_df/tbl_df/tbl/data.frame)
#> $ id : num 1
#> $ status: num 1
#> - attr(*, "groups")= tibble [1 × 2] (S3: tbl_df/tbl/data.frame)
#> ..$ id : num 1
#> ..$ .rows: list<int> [1:1]
#> .. ..$ : int 1
#> .. ..@ ptype: int(0)
#> ..- attr(*, ".drop")= logi TRUE
#------------------------------------------------------------------------------
# The tidytable version BAD: An extra NA row is introduced with grouped + NA values
#------------------------------------------------------------------------------
t2 <- tidytable::tribble(
~id, ~status,
1, 1,
4, 0,
7, NA,
)
output2 <- t2 |>
tidytable::group_by(id) |>
tidytable::filter(status == 1)
str(output2)
#> Classes 'grouped_tt', 'tidytable', 'data.table' and 'data.frame': 2 obs. of 2 variables:
#> $ id : num 1 NA
#> $ status: num 1 NA
#> - attr(*, ".internal.selfref")=<externalptr>
#> - attr(*, "groups")= chr "id"
#------------------------------------------------------------------------------
# The tidytable version GOOD: Without the group_by, it works fine
#------------------------------------------------------------------------------
t3 <- tidytable::tribble(
~id, ~status,
1, 1,
4, 0,
7, NA,
)
output3 <- t3 |>
# tidytable::group_by(id) |>
tidytable::filter(status == 1)
str(output3)
#> Classes 'tidytable', 'data.table' and 'data.frame': 1 obs. of 2 variables:
#> $ id : num 1
#> $ status: num 1
#> - attr(*, ".internal.selfref")=<externalptr>
#------------------------------------------------------------------------------
# The tidytable version GOOD: Without the NA in the table, it also works fine
#------------------------------------------------------------------------------
t4 <- tidytable::tribble(
~id, ~status,
1, 1,
4, 0,
# 7, NA,
)
output4 <- t4 |>
tidytable::group_by(id) |>
tidytable::filter(status == 1)
str(output4)
#> Classes 'grouped_tt', 'tidytable', 'data.table' and 'data.frame': 1 obs. of 2 variables:
#> $ id : num 1
#> $ status: num 1
#> - attr(*, ".internal.selfref")=<externalptr>
#> - attr(*, "groups")= chr "id"
Created on 2024-07-01 with reprex v2.1.0
markfairbanks commented
Hmm that's odd I'll take a look
markfairbanks commented
All fixed - thanks for catching this