easystats/datawizard

Define data frame method for `dw_data_xtabulates` object?

Closed this issue · 4 comments

Although the printed output looks great:

library(datawizard)

data_tabulate(mtcars, am, by = "cyl")
#> am    |  4 | 6 |  8 | <NA> | Total
#> ------+----+---+----+------+------
#> 0     |  3 | 4 | 12 |    0 |    19
#> 1     |  8 | 3 |  2 |    0 |    13
#> <NA>  |  0 | 0 |  0 |    0 |     0
#> ------+----+---+----+------+------
#> Total | 11 | 7 | 14 |    0 |    32

Sometimes a user may wish to extract this data into a data frame and then process it further (e.g. convert it to tidy data format), but the coercion doesn't produce great column names:

data_tabulate(mtcars, am, by = "cyl") |> 
  as.data.frame()
#>     am X4 X6 X8 NA.
#> 1    0  3  4 12   0
#> 2    1  8  3  2   0
#> 3 <NA>  0  0  0   0

Created on 2024-06-22 with reprex v2.1.0

It somehow works already, but for data_tabulate.data.frame(), a list is returned, as you can print multiple tables at once.

library(datawizard)
x <- data_tabulate(mtcars, am, by = "cyl")

as.data.frame(x[[1]])
#>     am 4 6  8 NA
#> 1    0 3 4 12  0
#> 2    1 8 3  2  0
#> 3 <NA> 0 0  0  0
as.data.frame(format(x[[1]]))
#>                                        am  4 6  8 <NA> Total
#> 1                                       0  3 4 12    0    19
#> 2                                       1  8 3  2    0    13
#> 3                                    <NA>  0 0  0    0     0
#> rep.....ncol.ftab..                                         
#> c..Total...as.character.total_row.. Total 11 7 14    0    32

Created on 2024-06-22 with reprex v2.1.0

We could simplify this step, and make the second output more beautiful.

What would as.data.frame() return when we have a list of data frames?

library(datawizard)
x <- data_tabulate(mtcars, c("am", "vs"), by = "cyl")
x
#> am    |  4 | 6 |  8 | <NA> | Total
#> ------+----+---+----+------+------
#> 0     |  3 | 4 | 12 |    0 |    19
#> 1     |  8 | 3 |  2 |    0 |    13
#> <NA>  |  0 | 0 |  0 |    0 |     0
#> ------+----+---+----+------+------
#> Total | 11 | 7 | 14 |    0 |    32
#> 
#> vs    |  4 | 6 |  8 | <NA> | Total
#> ------+----+---+----+------+------
#> 0     |  1 | 3 | 14 |    0 |    18
#> 1     | 10 | 4 |  0 |    0 |    14
#> <NA>  |  0 | 0 |  0 |    0 |     0
#> ------+----+---+----+------+------
#> Total | 11 | 7 | 14 |    0 |    32
str(x)
#> List of 2
#>  $ :Classes 'dw_data_xtabulate' and 'data.frame':    3 obs. of  5 variables:
#>   ..$ am: Factor w/ 2 levels "0","1": 1 2 NA
#>   ..$ 4 : int [1:3] 3 8 0
#>   ..$ 6 : int [1:3] 4 3 0
#>   ..$ 8 : int [1:3] 12 2 0
#>   ..$ NA: int [1:3] 0 0 0
#>   ..- attr(*, "total_n")= int 32
#>  $ :Classes 'dw_data_xtabulate' and 'data.frame':    3 obs. of  5 variables:
#>   ..$ vs: Factor w/ 2 levels "0","1": 1 2 NA
#>   ..$ 4 : int [1:3] 1 10 0
#>   ..$ 6 : int [1:3] 3 4 0
#>   ..$ 8 : int [1:3] 14 0 0
#>   ..$ NA: int [1:3] 0 0 0
#>   ..- attr(*, "total_n")= int 32
#>  - attr(*, "class")= chr [1:2] "dw_data_xtabulates" "list"
#>  - attr(*, "collapse")= logi FALSE
#>  - attr(*, "is_weighted")= logi FALSE

Created on 2024-06-22 with reprex v2.1.0

A good data structure for this purpose will be nested data frame, where a column will be a list of data frames.

Example with tidyverse, but this is also possible with base-R:

library(datawizard)
selected_vars <- c("am", "vs")
x <- data_tabulate(mtcars, selected_vars, by = "cyl")

tibble::tibble(
  var = selected_vars,
  table = purrr::map(x, ~ as.data.frame(.x))
)
#> # A tibble: 2 × 2
#>   var   table       
#>   <chr> <list>      
#> 1 am    <df [3 × 5]>
#> 2 vs    <df [3 × 5]>

Created on 2024-06-22 with reprex v2.1.0

library(datawizard)
selected_vars <- c("am", "vs", "gear")
x <- data_tabulate(mtcars, selected_vars, by = "cyl")

out <- data.frame(
  var = selected_vars,
  table = I(lapply(x, as.data.frame))
)

out$table[[2]]
#>     vs  4 6  8 NA
#> 1    0  1 3 14  0
#> 2    1 10 4  0  0
#> 3 <NA>  0 0  0  0

Created on 2024-06-22 with reprex v2.1.0