ggobi/ggally

Make `ggparcoord` (and helper methods) into a separate package

Opened this issue · 4 comments

ggparcoord() depends upon {scagnostics} which is difficult to install. It is such a pain to deal with!

If we made a new package ({ggparcoord}) that had {scagnostics} as a Suggests dependency, then (by default) when installing {ggally}, {scagnostics} would not be required for installation.

If {scagnostics} was not installed, users would be happier. If they need the {scagnostics} package, they would be opting into installing {scagnostics}.

cc @92amartins

What about using {cassowaryr} instead? https://github.com/numbats/cassowaryr

Hey, @harriet-mason!

We are considering to use your package (cassowayr) to replace a scagnostics calculation provided by the package of the same name.

Do you think that would work well?

We specifically use the package in this block of code:

ggally/R/ggparcoord.R

Lines 467 to 473 in 1f58feb

} else if (order %in% c(
"Outlying", "Skewed", "Clumpy", "Sparse", "Striated", "Convex", "Skinny",
"Stringy", "Monotonic"
)) {
require_namespaces("scagnostics")
scag <- scagnostics::scagnostics(saveData2)
data.m$variable <- factor(data.m$variable, levels = scag_order(scag, names(saveData2), order))

Hey @92amartins,

So, cassoaryr can calculate those scagnsotics, however you will likely get different results from the scagnsotics package.

Unlike scagnsotics, cassowaryr does not perform binning which means points can get very close together, leading to infintesimally small MST lengths. This means any scagnsotic that uses MST lengths in the denominator of it's calculation has a tendancy to be quite volotile and give unpredictable results. We tried to design more robust scagnsotics to prevent these issues (such as clumpy2 and striated2) however the calculations used by those scagnsotics are fundamentally different from those in the Leland and Wilkinson paper. Binning is something we have been hoping to implement, but haven't had time yet.

Additionally, you would need to use the most recent development version of the package. We had a series of issues with changing dependencies that broke the package a couple of times, so the version on CRAN may throw errors for some scatter plots that the scagnostics package would have had no issue on. The current Github version should not have this issue and we are going to do some additional checks before re-submitting the package to CRAN. Ultimately, whether or not cassowaryr would would work well here depends on whether these trade offs are better or worse than trying to install scagnostics haha.

Good to know. Thanks for the inputs on that!