sfirke/janitor

Provide sort argument to tabyl()

richierocks opened this issue ยท 8 comments

In the same way that dplyr::count() has a sort argument, which sorts the results in descending order of count, it would be helpful if tabyl() had an equivalent sort argument.

In the 1-dimensional case, the behavior is fairly straightforward.

library(janitor)
mtcars %>% 
  tabyl(cyl, sort = TRUE)

would be equivalent to

library(janitor)
library(dplyr)
mtcars %>% 
  tabyl(cyl)%>%
  arrange(desc(n))
#>  cyl  n percent
#>    8 14 0.43750
#>    4 11 0.34375
#>    6  7 0.21875

For two dimensions, it's a little trickier. I think the best behavior is to only sort on the first dimension. That is,

mtcars %>% 
  tabyl(cyl, gear, sort = TRUE)

would be equivalent to

core <- mtcars %>% 
  tabyl(cyl, gear) %>% 
  attr(., "core")
ord <- order(rowSums(core), decreasing = TRUE)
core[ord, ]
#>   cyl  3 4 5
#> 3   8 12 0 2
#> 1   4  1 8 2
#> 2   6  2 4 1

You might want to allow the user to choose which dimension to sort on. For example, setting sort_dim = 1 would give the previous behavior, and sort_dim = 2 would sort the column order by decreasing column sums, and sort_dim = NA would be the default of no sorting. I'm not sure if that is too complicated an interface though.

It's funny, back when one-way tabyl was a separate function it had a sort argument exactly like that. Then when it got merged with two-way tabyl, I didn't see a clear way to implement sort and so I removed it. Some past discussion is at #351. I appreciate your example of what sort could mean in the context of two-way tabyls and would be curious to hear from other users if such two-way sort is something they do. If so, the advantage to sort is that it would also sort the underlying core so that later adorn_ functions would work.

I could be convinced, I do like sort for one-way tabyls. But then it clutters the interface for two-way, and it's not so bad to type %>% arrange(desc(n)).

My two cents - I find that it's much simpler to just use %>% arrange(desc(n)) when I want to reorder something. I feel like adding a sort argument for a 2 way table that either sort the first dim or have to specify a dim is less intuitive and adds unnecessary complexity.

If it makes any difference, here's the context on why I want this.

I'm trying to teach exploration of categorical variables to some fairly new R users, and I want to end up with something like

library(dplyr)
mtcars %>% 
  count(carb, sort = TRUE) %>% 
  mutate(percent = 100 * n / sum(n))

So it's just the 1-way case, but it means I have to explain a lot of subtleties like "where did that n column come from?", and "why is percent calculated like that?".

I was hoping to use janitor to avoid those sorts of code discussions and just focus on the dataset. Unfortunately, the equivalent janitor code is still two lines and requires a little bit of thinking about for people who aren't that confident with data manipulation.

library(dplyr)
library(janitor)
mtcars %>% 
  tabyl(carb) %>% 
  arrange(desc(n))

Since I'm optimizing for code that requires minimal explaining, it seems like

library(janitor)
mtcars %>% 
  tabyl(carb, sort = TRUE)

or possibly

library(janitor)
mtcars %>% 
  tabyl(carb, sort_dim = 1)

would work really nicely.

Please add back in, or fix sorting: dplyr::arrange(desc()) does work to an extent, but it does break; here's what I've found:
tabyl() %>% arrange(desc()) # that's fine

tabyl() %>% arrange(desc()) %>% adorn_totals() # also fine

tabyl() %>% arrange(desc()) %>% adorn_totals() %>% adorn_percentates() # still fine

tabyl() %>% arrange(desc()) %>% adorn_totals() %>% adorn_percentates() %>% adorn_pct_formatting() # table's looking nice....

tabyl() %>% arrange(desc()) %>% adorn_totals() %>% adorn_percentates() %>% adorn_pct_formatting() %>% adorn_ns() # order is now completely screwed up.

The only way aroud this I've found is to used adorn_ns(postiion = "front"), which kind-of allows me to then use arrange(desc()) afterwards, but then the "Total" row is no longer at the bottom.

Apologies if I've failed to read something I should have read before posting.

@pstils is your example a two-way tabyl, like mtcars %>% tabyl(am, cyl)? If so, I think this should be fixed via this other issue: #407 It's a very detailed look but basically, if the problem is that the original non-sorted Ns are adorning onto the now-sorted tabyl, I think I can fix it there without adding a sort argument to tabyl().

@pstils I just pushed an update to the main branch that I think will address what you're talking about. See if your example above now works after you install the dev version from GitHub?

Example:

mtcars %>% tabyl(am, cyl) %>% arrange(desc(`4`)) %>% adorn_totals() %>% adorn_percentages() %>% adorn_pct_formatting() %>% adorn_ns()

    am          4         6          8
     1 61.5%  (8) 23.1% (3) 15.4%  (2)
     0 15.8%  (3) 21.1% (4) 63.2% (12)
 Total 34.4% (11) 21.9% (7) 43.8% (14)

@richierocks sorry for the slow response and for this getting away from what you brought up. I don't plan to re-implement a sort argument so will close this issue, I'm afraid arrange() is the best bet for now though I agree it's a little trickier for beginners.

@sfirke Thanks Sam, despite my non-reprex that was exactly the situation. I've got the dev version and it works exactly as indended with arrange(desc()) after the tabyl() now - Thank you again.