ropensci/skimr

Onboarding: naming and related choices for min and max, p0, p100

Closed this issue · 2 comments

elinw commented

We've already had extended discussion in the tracker about this and it's clear that there is no consensus about what is "least surprising" in the edge cases where this matters.

There are three issues:

  • what functions to use (percentiles or min and max).
  • how to display the names.
  • implicitly what to do about the median.

These are separate, and we can see that in how summary() works that the core is inconsistent also.

Here is the long discussion in the tracker #224

elinw commented

I guess I'm going to make the argument that maintaining transitivity is not particularly important in the context of skimming. Inf and -Inf can be generated by both things like dividing by 0 and using min() and max(), but NA actually accurately represents what is in the data when all of the values are NA. If there is an actual value of Inf or -Inf then that should be shown.

elinw commented

Ok the percentile issue is now merged with p50 and handling the minimum and maximum as p0 and p100. This means that for all NA vectors they will be NA and NA. Since a use can add min and max if desire they can get the -Inf and Inf.