Add thousands separator (function)
Nic-Chr opened this issue ยท 4 comments
Opening this to suggest including a function that adds a thousands separator to numbers.
add_thousands_sep <- function(x){ gsub("(?!^)(?=(?:\\d{3})+$)", ",", x, perl = TRUE) }
This essentially adds a thousands separator for pretty formatting of numbers, including character vectors that contain numbers.
The regex itself could use improvement though but overall handy when some of your numbers are character vectors.
For this and #61 have you seen the {scales} package (https://scales.r-lib.org/)? It seems to do both of these things with scales::label_comma()
(label_number.html) and scales::label_percent()
(label_percent.html) I've only used scales::percent()
in the past but this is now apparently retired ๐
Looking at the source, it looks like they all ultimately just use base::format()
. I think it would make sense to just re-import from there, or create a wrapper function to simplify some of the options (I think some are not really applicable e.g. we will always use a comma as a thousands separator and a period for decimal).
scales
is a pretty widely used package, so my preference would be to just point people towards that for these tasks. There's no point in re-inventing the wheel :)
I think the one advantage is there is no need for the input vector to be numeric and so one can apply a thousands separator to strings.
For example,
> scales::comma("1234")
Error in x * scale : non-numeric argument to binary operator
add_thousands_sep("1234")
[1] "1,234"
It's also faster than using base::format
or scales::comma
x <- sample(1000:10000, size = 10^5, replace = TRUE)
> microbenchmark::microbenchmark(m1 = format(x, big.mark = ","),
+ m2 = scales::comma(x),
+ m3 = add_thousands_sep(x), times = 5)
Unit: milliseconds
expr min lq mean median uq max neval cld
m1 2437.7410 2438.9493 2450.6998 2446.4981 2454.9681 2475.3427 5 b
m2 3143.6610 3152.7269 3397.6765 3156.4771 3172.5286 4362.9889 5 c
m3 65.2607 65.3873 66.9517 65.8287 67.9849 70.2969 5 a
The utility is quite niche so I can see why it might be too specific of a function, though I've personally found it helpful ๐
Closing this as I no longer think this would be a useful function and one can just use format()
or the scales package.