smartinsightsfromdata/rpivotTable

factor levels order is not retained

Opened this issue · 7 comments

Using factor is nice when you want to have ordered your strings, and is being used commonly in plotting libraries/functions. I don't see reason why pivot table should behave differently.
Below example shows that order of levels is ignored. This feature request is about ordering those rows/cols entries according to order of factor levels.

df = data.frame(name = factor(c("b","a","b","b","a"), levels=c("b","a")),
                grp = factor(c("x","x","y","y","y"), levels=c("y","x")),
                val = 1:5)
rpivotTable(df,
            rows = "name",
            cols = "grp",
            aggregatorName = "Average",
            vals = "val")

factor levels could automatically populate sorters argument.

This would have to be passed to the JS layer via sorters

Hello, I've tried the solution proposed in #106, but in my case it still won't give the correct result. See my example:

set.seed(123)
library(dplyr)
dat <- data.frame(
  x = rnorm(30)*10,
  y = rnorm(30)*10
) %>% 
  mutate(x_cut = cut(x,5),
         y_cut = cut(y,5))

# desired ordering
# x order: (-23.9,-15.2] (-15.2,-6.58] (-6.58,2.06] (2.06,10.7] (10.7,19.4]
# y order: (-19.9,-11.9] (-11.9,-3.97] (-3.97,3.96] (3.96,11.9] (11.9,19.9]

rpivotTable(dat) # wrong order, no sorter

# solution
make_sorters <- function(data) {
  if( !length(data) ) return(NULL)
  f <- sapply(data, is.factor)
  if( !sum(f) ) return(NULL)
  fcols <- names(data)[f]
  flvls <- sapply(fcols, function(fcol, data) levels(data[[fcol]]), data=data, simplify=FALSE)
  jslvls <- sapply(flvls, function(lvls) paste(paste0("\"",lvls,"\""), collapse=", "))
  sorter <- sprintf("if (attr == \"%s\") { return sortAs([%s]); }", fcols, jslvls)
  sprintf("function(attr) {\nvar sortAs = $.pivotUtilities.sortAs;\n%s\n}", paste(sorter, collapse="\n"))
}
s <- make_sorters(dat)

rpivotTable(dat, sorter = s) # wrong order, with sorter

I'm running:

packageVersion("rpivotTable")
[1] ‘0.3.0’

Thanks!

@rlavelli your desired output does not seems to be corresponding to your data. Are you on old R version having different random generator algo? Please include sessionInfo(). Also use of dplyr seems to be irrelevant here, best to strip out unrelated stuff to ensure it is not interferring the process.

set.seed(123)
dat <- data.frame(
  x = rnorm(30)*10,
  y = rnorm(30)*10
)
dat$x_cut = cut(dat$x,5)
dat$y_cut = cut(dat$y,5)

will do.

Also please provide output of levels(dat$x_cut) and levels(dat$y_cut).
Did you actually install factor-sorters branch? there is some logic change in the package made, it is not only a matter of passing sorter argument. Note that argument does not exist in the branch, there is sorters argument instead.

Thank you for the reply, and sorry to bother.
No I didn't install the actual branch, I was under the impression that the new make_sorters function would suffice. I'll try to give an update about that.

For the sake of completeness, here's the info you asked (in a clean R session).

set.seed(123)
dat <- data.frame(
  x = rnorm(30)*10,
  y = rnorm(30)*10
)
dat$x_cut = cut(dat$x,5)
dat$y_cut = cut(dat$y,5)

levels(dat$x_cut)
# [1] "(-19.7,-12.2]" "(-12.2,-4.65]" "(-4.65,2.86]"  "(2.86,10.4]"   "(10.4,17.9]"  
levels(dat$y_cut)
# [1] "(-15.5,-8.05]"  "(-8.05,-0.617]" "(-0.617,6.82]"  "(6.82,14.3]"    "(14.3,21.7]"   

# sessionInfo()
# R version 3.6.1 (2019-07-05)

I was actually able to fix the sorting problem by adding an increasing number before each cut label in my actual case. Like: "1 (-19.7,-12.2]", "2 (-12.2,-4.65]", "3 (-4.65,2.86]" "4 ..". It's not pretty but it gives the correct result even without the use of sorter.

I'll try to install the full branch update and test it. Again, Thank you.

Your workaround is basically avoiding the problem in the first place. As stated in this issue, alphabetical order is used instead of order of levels, thus adding a prefix number is disabling the issue.
Note that your initial report included x order: (-23.9,-15.2] ... which was probably generated with different random seed set. Updated one looks fine.

To easily install this branch you can use remotes or devtools package.

remotes::install_github("jangorecki/rpivotTable@factor-sorters")

Please report back if you still have a problem even when using this branch.

@rlavelli any news if the branch address your case?

I'm sorry for the delay. I've tried the full branch and it works. Thank you!