Provide sort argument to tabyl()
richierocks opened this issue ยท 8 comments
In the same way that dplyr::count()
has a sort
argument, which sorts the results in descending order of count, it would be helpful if tabyl()
had an equivalent sort
argument.
In the 1-dimensional case, the behavior is fairly straightforward.
library(janitor)
mtcars %>%
tabyl(cyl, sort = TRUE)
would be equivalent to
library(janitor)
library(dplyr)
mtcars %>%
tabyl(cyl)%>%
arrange(desc(n))
#> cyl n percent
#> 8 14 0.43750
#> 4 11 0.34375
#> 6 7 0.21875
For two dimensions, it's a little trickier. I think the best behavior is to only sort on the first dimension. That is,
mtcars %>%
tabyl(cyl, gear, sort = TRUE)
would be equivalent to
core <- mtcars %>%
tabyl(cyl, gear) %>%
attr(., "core")
ord <- order(rowSums(core), decreasing = TRUE)
core[ord, ]
#> cyl 3 4 5
#> 3 8 12 0 2
#> 1 4 1 8 2
#> 2 6 2 4 1
You might want to allow the user to choose which dimension to sort on. For example, setting sort_dim = 1
would give the previous behavior, and sort_dim = 2
would sort the column order by decreasing column sums, and sort_dim = NA
would be the default of no sorting. I'm not sure if that is too complicated an interface though.
It's funny, back when one-way tabyl was a separate function it had a sort
argument exactly like that. Then when it got merged with two-way tabyl, I didn't see a clear way to implement sort
and so I removed it. Some past discussion is at #351. I appreciate your example of what sort
could mean in the context of two-way tabyls and would be curious to hear from other users if such two-way sort is something they do. If so, the advantage to sort
is that it would also sort the underlying core
so that later adorn_
functions would work.
I could be convinced, I do like sort
for one-way tabyls. But then it clutters the interface for two-way, and it's not so bad to type %>% arrange(desc(n))
.
My two cents - I find that it's much simpler to just use %>% arrange(desc(n))
when I want to reorder something. I feel like adding a sort argument for a 2 way table that either sort the first dim or have to specify a dim is less intuitive and adds unnecessary complexity.
If it makes any difference, here's the context on why I want this.
I'm trying to teach exploration of categorical variables to some fairly new R users, and I want to end up with something like
library(dplyr)
mtcars %>%
count(carb, sort = TRUE) %>%
mutate(percent = 100 * n / sum(n))
So it's just the 1-way case, but it means I have to explain a lot of subtleties like "where did that n
column come from?", and "why is percent
calculated like that?".
I was hoping to use janitor
to avoid those sorts of code discussions and just focus on the dataset. Unfortunately, the equivalent janitor code is still two lines and requires a little bit of thinking about for people who aren't that confident with data manipulation.
library(dplyr)
library(janitor)
mtcars %>%
tabyl(carb) %>%
arrange(desc(n))
Since I'm optimizing for code that requires minimal explaining, it seems like
library(janitor)
mtcars %>%
tabyl(carb, sort = TRUE)
or possibly
library(janitor)
mtcars %>%
tabyl(carb, sort_dim = 1)
would work really nicely.
Please add back in, or fix sorting: dplyr::arrange(desc())
does work to an extent, but it does break; here's what I've found:
tabyl() %>% arrange(desc())
# that's fine
tabyl() %>% arrange(desc()) %>% adorn_totals()
# also fine
tabyl() %>% arrange(desc()) %>% adorn_totals() %>% adorn_percentates()
# still fine
tabyl() %>% arrange(desc()) %>% adorn_totals() %>% adorn_percentates() %>% adorn_pct_formatting()
# table's looking nice....
tabyl() %>% arrange(desc()) %>% adorn_totals() %>% adorn_percentates() %>% adorn_pct_formatting() %>% adorn_ns()
# order is now completely screwed up.
The only way aroud this I've found is to used adorn_ns(postiion = "front")
, which kind-of allows me to then use arrange(desc())
afterwards, but then the "Total" row is no longer at the bottom.
Apologies if I've failed to read something I should have read before posting.
@pstils is your example a two-way tabyl, like mtcars %>% tabyl(am, cyl)
? If so, I think this should be fixed via this other issue: #407 It's a very detailed look but basically, if the problem is that the original non-sorted Ns are adorning onto the now-sorted tabyl, I think I can fix it there without adding a sort
argument to tabyl().
@pstils I just pushed an update to the main branch that I think will address what you're talking about. See if your example above now works after you install the dev version from GitHub?
Example:
mtcars %>% tabyl(am, cyl) %>% arrange(desc(`4`)) %>% adorn_totals() %>% adorn_percentages() %>% adorn_pct_formatting() %>% adorn_ns()
am 4 6 8
1 61.5% (8) 23.1% (3) 15.4% (2)
0 15.8% (3) 21.1% (4) 63.2% (12)
Total 34.4% (11) 21.9% (7) 43.8% (14)
@richierocks sorry for the slow response and for this getting away from what you brought up. I don't plan to re-implement a sort
argument so will close this issue, I'm afraid arrange()
is the best bet for now though I agree it's a little trickier for beginners.