strengejacke/sjmisc

Incorrect group names in output of frq() using grouped data

andersson10 opened this issue · 1 comments

When I use frq() with a grouped tibble, with grouping done with dplyr::group_by and the tibble with tibble::tibble, the printed group names are not always associated with the correct group. The output is not the same as when I use base::data.frame, for instance.

Consider the following:

library(tibble)
library(dplyr)
library(sjmisc)

df_1 <- tibble(
  x = rep(c("b", "a"), each = 2),
  y = rep(1:2, each = 2)
)

frq(group_by(df_1, x))

The output from frq(group_by(df_1, x)) above is:

> frq(group_by(df_1, x))

Grouped by:
x: b
 
# y <integer> 
# total N=2  valid N=2  mean=2.00  sd=0.00
 
  val frq raw.prc valid.prc cum.prc
    2   2     100       100     100
 <NA>   0       0        NA      NA

Grouped by:
x: a
 
# y <integer> 
# total N=2  valid N=2  mean=1.00  sd=0.00
 
  val frq raw.prc valid.prc cum.prc
    1   2     100       100     100
 <NA>   0       0        NA      NA

> 

This output seems to be incorrect in that the values of the grouping variable have switched places. For instance, the mean of y for group x: b should be 1, not 2, as displayed.

> mean(df_1$y[df_1$x == "b"])
[1] 1
> 

Where it says Grouped by: x: b it seems it should say Grouped by: x: a.

We can compare this to a similar operation, in which tibble::tibble has been replaced by base::data.frame:

df_2 <- data.frame(
  x = rep(c("b", "a"), each = 2),
  y = rep(1:2, each = 2)
)

df_2
frq(group_by(df_2, x))

Here, frq(group_by(df_1, x)) generates the follwing output:

> frq(group_by(df_2, x))

Grouped by:
x: a
 
# y <integer> 
# total N=2  valid N=2  mean=2.00  sd=0.00
 
  val frq raw.prc valid.prc cum.prc
    2   2     100       100     100
 <NA>   0       0        NA      NA

Grouped by:
x: b
 
# y <integer> 
# total N=2  valid N=2  mean=1.00  sd=0.00
 
  val frq raw.prc valid.prc cum.prc
    1   2     100       100     100
 <NA>   0       0        NA      NA

> 

As we can see, the two groups a and b does not display the same values in both outputs.

Version info:
R version 3.5.2 (2018-12-20)
tibble 2.0.1
dplyr 0.8.0.1
sjmisc 2.7.7

Thanks! This is due to reordering when the grouping column (x in this case) is a character vector, and not a factor. If you use data.frame() with stringsAsFactors = FALSE, you get the same error. I fixed this, and will commit later.