tabyl sorts the table column by the integer 'name' not the number
daaronr opened this issue · 1 comments
Bug
Tabyl (with 2 arguments) sorts the column of the table by the integer 'name' not the number. I want the opposite, of course.
Brief description of the problem
df1 <- data.frame(var1 = c(1:10),
+ var2 = c(1:10))
> df1 %>% tabyl(var1,var2)
Yields:
var1 1 10 2 3 4 5 6 7 8 9
1 1 0 0 0 0 0 0 0 0 0
2 0 0 1 0 0 0 0 0 0 0
3 0 0 0 1 0 0 0 0 0 0
4 0 0 0 0 1 0 0 0 0 0
5 0 0 0 0 0 1 0 0 0 0
6 0 0 0 0 0 0 1 0 0 0
7 0 0 0 0 0 0 0 1 0 0
8 0 0 0 0 0 0 0 0 1 0
9 0 0 0 0 0 0 0 0 0 1
10 0 1 0 0 0 0 0 0 0 0
Like a prison inmate, I'm an integer, sort me by a number, not a name please!
Agreed this is not the desired behavior, thanks for reporting. And it also extends to the sorting of 3-way tabyls:
library(janitor)
library(dplyr)
data.frame(var1 = 1:10, var2 = 1:10, var3 = 1:10) %>%
mutate(var2 = ordered(var2, levels = var2)) %>%
tabyl(var1, var2, var3)
The list of tabyls goes 1
, 10
, etc.
The fix
We can take advantage of the fact that factors already get sorted correctly, and just make the numerics into factors upstream of that.
For the 2-way issue, I think this could be fixed with by adding something like if(is.numeric(tabl[[2]]) { tabl[[2]] <- ordered(tabl[[2]], levels = tabl[[2]])) }
at the top of the block here. Then it would be treated as an ordered factor, taking advantage of that existing code from that point out.
For the 3-way issue, I think that same block could be added https://github.com/sfirke/janitor/blob/master/R/tabyl.R#L232, but referring to dat[[3]]
.
I marked this "good first issue" and am hopeful someone newer to R development could take a shot at it, especially with these pointers above. It will also require tests to make sure the fix works, I can assist there if needed.