sfirke/janitor

tabyl sorts the table column by the integer 'name' not the number

daaronr opened this issue · 1 comments

Bug

Tabyl (with 2 arguments) sorts the column of the table by the integer 'name' not the number. I want the opposite, of course.


Brief description of the problem

 df1 <- data.frame(var1 = c(1:10),
+                  var2 = c(1:10))

> df1 %>% tabyl(var1,var2)

Yields:

var1 1 10 2 3 4 5 6 7 8 9
    1 1  0 0 0 0 0 0 0 0 0
    2 0  0 1 0 0 0 0 0 0 0
    3 0  0 0 1 0 0 0 0 0 0
    4 0  0 0 0 1 0 0 0 0 0
    5 0  0 0 0 0 1 0 0 0 0
    6 0  0 0 0 0 0 1 0 0 0
    7 0  0 0 0 0 0 0 1 0 0
    8 0  0 0 0 0 0 0 0 1 0
    9 0  0 0 0 0 0 0 0 0 1
   10 0  1 0 0 0 0 0 0 0 0

Like a prison inmate, I'm an integer, sort me by a number, not a name please!

Agreed this is not the desired behavior, thanks for reporting. And it also extends to the sorting of 3-way tabyls:

library(janitor)
library(dplyr)

data.frame(var1 = 1:10, var2 = 1:10, var3 = 1:10) %>%
  mutate(var2 = ordered(var2, levels = var2)) %>%
  tabyl(var1, var2, var3)

The list of tabyls goes 1, 10, etc.

The fix

We can take advantage of the fact that factors already get sorted correctly, and just make the numerics into factors upstream of that.

For the 2-way issue, I think this could be fixed with by adding something like if(is.numeric(tabl[[2]]) { tabl[[2]] <- ordered(tabl[[2]], levels = tabl[[2]])) } at the top of the block here. Then it would be treated as an ordered factor, taking advantage of that existing code from that point out.

For the 3-way issue, I think that same block could be added https://github.com/sfirke/janitor/blob/master/R/tabyl.R#L232, but referring to dat[[3]].

I marked this "good first issue" and am hopeful someone newer to R development could take a shot at it, especially with these pointers above. It will also require tests to make sure the fix works, I can assist there if needed.