Public-Health-Scotland/phsmethods

Add thousands separator (function)

Nic-Chr opened this issue ยท 4 comments

Opening this to suggest including a function that adds a thousands separator to numbers.

add_thousands_sep <- function(x){ gsub("(?!^)(?=(?:\\d{3})+$)", ",", x, perl = TRUE) }

This essentially adds a thousands separator for pretty formatting of numbers, including character vectors that contain numbers.
The regex itself could use improvement though but overall handy when some of your numbers are character vectors.

For this and #61 have you seen the {scales} package (https://scales.r-lib.org/)? It seems to do both of these things with scales::label_comma() (label_number.html) and scales::label_percent() (label_percent.html) I've only used scales::percent() in the past but this is now apparently retired ๐Ÿ™„

Looking at the source, it looks like they all ultimately just use base::format(). I think it would make sense to just re-import from there, or create a wrapper function to simplify some of the options (I think some are not really applicable e.g. we will always use a comma as a thousands separator and a period for decimal).

scales is a pretty widely used package, so my preference would be to just point people towards that for these tasks. There's no point in re-inventing the wheel :)

I think the one advantage is there is no need for the input vector to be numeric and so one can apply a thousands separator to strings.
For example,

> scales::comma("1234")
Error in x * scale : non-numeric argument to binary operator
add_thousands_sep("1234")
[1] "1,234"

It's also faster than using base::format or scales::comma

x <- sample(1000:10000, size = 10^5, replace = TRUE)
> microbenchmark::microbenchmark(m1 = format(x, big.mark = ","),
+                                m2 = scales::comma(x),
+                                m3 = add_thousands_sep(x), times = 5)
Unit: milliseconds
 expr       min        lq      mean    median        uq       max neval cld
   m1 2437.7410 2438.9493 2450.6998 2446.4981 2454.9681 2475.3427     5  b 
   m2 3143.6610 3152.7269 3397.6765 3156.4771 3172.5286 4362.9889     5   c
   m3   65.2607   65.3873   66.9517   65.8287   67.9849   70.2969     5 a 

The utility is quite niche so I can see why it might be too specific of a function, though I've personally found it helpful ๐Ÿ˜ƒ

Closing this as I no longer think this would be a useful function and one can just use format() or the scales package.