Define data frame method for `dw_data_xtabulates` object?
Closed this issue · 4 comments
Although the printed output looks great:
library(datawizard)
data_tabulate(mtcars, am, by = "cyl")
#> am | 4 | 6 | 8 | <NA> | Total
#> ------+----+---+----+------+------
#> 0 | 3 | 4 | 12 | 0 | 19
#> 1 | 8 | 3 | 2 | 0 | 13
#> <NA> | 0 | 0 | 0 | 0 | 0
#> ------+----+---+----+------+------
#> Total | 11 | 7 | 14 | 0 | 32
Sometimes a user may wish to extract this data into a data frame and then process it further (e.g. convert it to tidy data format), but the coercion doesn't produce great column names:
data_tabulate(mtcars, am, by = "cyl") |>
as.data.frame()
#> am X4 X6 X8 NA.
#> 1 0 3 4 12 0
#> 2 1 8 3 2 0
#> 3 <NA> 0 0 0 0
Created on 2024-06-22 with reprex v2.1.0
It somehow works already, but for data_tabulate.data.frame()
, a list is returned, as you can print multiple tables at once.
library(datawizard)
x <- data_tabulate(mtcars, am, by = "cyl")
as.data.frame(x[[1]])
#> am 4 6 8 NA
#> 1 0 3 4 12 0
#> 2 1 8 3 2 0
#> 3 <NA> 0 0 0 0
as.data.frame(format(x[[1]]))
#> am 4 6 8 <NA> Total
#> 1 0 3 4 12 0 19
#> 2 1 8 3 2 0 13
#> 3 <NA> 0 0 0 0 0
#> rep.....ncol.ftab..
#> c..Total...as.character.total_row.. Total 11 7 14 0 32
Created on 2024-06-22 with reprex v2.1.0
We could simplify this step, and make the second output more beautiful.
What would as.data.frame()
return when we have a list of data frames?
library(datawizard)
x <- data_tabulate(mtcars, c("am", "vs"), by = "cyl")
x
#> am | 4 | 6 | 8 | <NA> | Total
#> ------+----+---+----+------+------
#> 0 | 3 | 4 | 12 | 0 | 19
#> 1 | 8 | 3 | 2 | 0 | 13
#> <NA> | 0 | 0 | 0 | 0 | 0
#> ------+----+---+----+------+------
#> Total | 11 | 7 | 14 | 0 | 32
#>
#> vs | 4 | 6 | 8 | <NA> | Total
#> ------+----+---+----+------+------
#> 0 | 1 | 3 | 14 | 0 | 18
#> 1 | 10 | 4 | 0 | 0 | 14
#> <NA> | 0 | 0 | 0 | 0 | 0
#> ------+----+---+----+------+------
#> Total | 11 | 7 | 14 | 0 | 32
str(x)
#> List of 2
#> $ :Classes 'dw_data_xtabulate' and 'data.frame': 3 obs. of 5 variables:
#> ..$ am: Factor w/ 2 levels "0","1": 1 2 NA
#> ..$ 4 : int [1:3] 3 8 0
#> ..$ 6 : int [1:3] 4 3 0
#> ..$ 8 : int [1:3] 12 2 0
#> ..$ NA: int [1:3] 0 0 0
#> ..- attr(*, "total_n")= int 32
#> $ :Classes 'dw_data_xtabulate' and 'data.frame': 3 obs. of 5 variables:
#> ..$ vs: Factor w/ 2 levels "0","1": 1 2 NA
#> ..$ 4 : int [1:3] 1 10 0
#> ..$ 6 : int [1:3] 3 4 0
#> ..$ 8 : int [1:3] 14 0 0
#> ..$ NA: int [1:3] 0 0 0
#> ..- attr(*, "total_n")= int 32
#> - attr(*, "class")= chr [1:2] "dw_data_xtabulates" "list"
#> - attr(*, "collapse")= logi FALSE
#> - attr(*, "is_weighted")= logi FALSE
Created on 2024-06-22 with reprex v2.1.0
A good data structure for this purpose will be nested data frame, where a column will be a list of data frames.
Example with tidyverse, but this is also possible with base-R:
library(datawizard)
selected_vars <- c("am", "vs")
x <- data_tabulate(mtcars, selected_vars, by = "cyl")
tibble::tibble(
var = selected_vars,
table = purrr::map(x, ~ as.data.frame(.x))
)
#> # A tibble: 2 × 2
#> var table
#> <chr> <list>
#> 1 am <df [3 × 5]>
#> 2 vs <df [3 × 5]>
Created on 2024-06-22 with reprex v2.1.0
library(datawizard)
selected_vars <- c("am", "vs", "gear")
x <- data_tabulate(mtcars, selected_vars, by = "cyl")
out <- data.frame(
var = selected_vars,
table = I(lapply(x, as.data.frame))
)
out$table[[2]]
#> vs 4 6 8 NA
#> 1 0 1 3 14 0
#> 2 1 10 4 0 0
#> 3 <NA> 0 0 0 0
Created on 2024-06-22 with reprex v2.1.0