feature suggestion: parameter to specify decimal separator on adorn_pct_formatting
mgacc0 opened this issue ยท 16 comments
Would you consider of interest adding a parameter "decimal.mark" (or similar) to adorn_pct_formatting
?
Yes, that sounds like a good idea. This may be a blind spot for me as an American - is this for countries where what I know as 99.1% would be written 99,1% ?
I'm from Europe.
Internally (on scripts and storage) I often use decimal point. But for reports and output sometimes it's a requirement to use decimal comma.
There are a lot of countries that use decimal comma: https://en.wikipedia.org/wiki/File:DecimalSeparator.svg
Whoa that is a lot indeed! Yes, let's add this. I am not sure when I'll be able to implement it. I'd accept a pull request if someone wants to add it, or if none comes, then I'll hopefully add it down the road.
I'm trying to think through the implementation...
- Are there other functions I'd need to modify? I don't think so...
- Are there other packages that already do this? I could study them, borrow from them, or maybe call them directly
adorn_pct_formatting
converts numeric to string.
So it should be the only one where it's needed.
As a reference:
number
from scales package https://github.com/r-lib/scales/blob/master/R/label-number.r
internally calls the base::format
function.
Just a thought here that I haven't looked into due to lack of time, but isn't this a locale setting for R? Perhaps the function could look to that for info, rather than requiring user input?
Use getOption("OutDec")
as the default value for decimal.mark
parameter:
adorn_pct_formatting <- function(dat, digits = 1, decimal.mark = getOption("OutDec"), rounding = "half to even", affix_sign = TRUE, ...)
Ooh nice. Okay I can take a shot at this. Then when I get it on a branch, I'll post here and would love for someone in a comma-using locale to try it out.
@mgacc0 I have a draft of this as #510. Can you take a look please? Hoping to finalize it in the next week or so to have it ready to go CRAN.
I ended up not adding a new argument. Instead it pulls from a user's locale, the value of getOption("OutDec")
. There's a cost to adding a new argument, in that it breaks code that relied on column names being the 5th variable, like this test: https://github.com/sfirke/janitor/blob/main/tests/testthat/test-adorn-pct-formatting.R#L164. I was encouraging people to use the ,,,
paradigm to get to the tidyselect cols argument and this would break that pattern.
My thinking was that using the user's locale would do the job virtually all the time, and in rare cases where it doesn't, they can set options(OutDec= ",")
before calling the function.
Also @YonghuiDong I see you gave ๐ above, please share if you have any thoughts on the PR linked above
I haven't heard from any potential user of this feature. I'm going to merge to main branch so it's easier for people to install the dev version, then put out a call on Mastodon and see if I can find anyone to give feedback.
I am sorry to report that the feature does not seem to be working - when using the Czech locale (we are one of the European troublemakers) I still get the dot as decimal separator; see the output below.
Also, on a personal note: it is very common practice for R users from non-English speaking countries to have set English locale as default. There are many reasons for that - such as that you don't really want to google for error messages in some marginal language such as mine, when there are troves of English resources just lying around. Also it is not the bestest of practice to have commas as decimal marks in your actual code; you want dots to make it universal.
As a consequence going by the decimal separator from locale can be unreliable. Many of the target users will have their development environment in English, but will be producing locally flavored output.
> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] Czech_Czechia.1250
system code page: 65001
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] janitor_2.1.0.9000 dplyr_1.0.10
loaded via a namespace (and not attached):
[1] rstudioapi_0.14 magrittr_2.0.3 tidyselect_1.2.0 timechange_0.1.1 R6_2.5.1 rlang_1.0.6
[7] fansi_1.0.3 stringr_1.5.0 tools_4.2.0 utf8_1.2.2 cli_3.6.0 DBI_1.1.3
[13] withr_2.5.0 ellipsis_0.3.2 assertthat_0.2.1 tibble_3.1.8 lifecycle_1.0.3 purrr_1.0.0
[19] tidyr_1.2.1 vctrs_0.5.1 glue_1.6.2 snakecase_0.11.0 stringi_1.7.8 compiler_4.2.0
[25] pillar_1.8.1 generics_0.1.3 lubridate_1.9.0 pkgconfig_2.0.3
> mtcars %>%
+ tabyl(am, cyl) %>%
+ adorn_percentages("col") %>%
+ adorn_pct_formatting()
am 4 6 8
0 27.3% 57.1% 85.7%
1 72.7% 42.9% 14.3%
Thank you @jlacko for the user perspective! To check something: would it be a viable workflow for a user who wanted commas in their output tabyl but not the rest of the time to set options(OutDec= ",")
before printing their tabyls?
Based on your feedback I'm thinking there should be an explicit argument in the adorn_pct_formatting
function that allows for decimal mark control. Do you/others think that the default value should be .
or the locale-specific getOption("OutDec")
?
I don't understand why the locale marker is not getting picked up in your case, but maybe it doesn't matter if that workflow is undesirable anyway.
Having thought about the issue a bit I believe the approach using OutDec
option makes great sense.
For two reasons:
- the use case for comma as decimal mark is not on actual code, but on generated output - documents in R markdown & now Quarto. In generating markdown & quarto documents is common practice to set options for stuff like
{knitr}
in the specialsetup
chunk. So there is precedent. - the
option(OutDec=",")
guides not onlyjanitor::adorn_pct_formatting()
behavior specifically, but the decimal separator generally; such as when called viabase::print()
. So setting it once at document level should lead to a consistent behavior across all printing functions.
I confess I am liking the approach the more I think about it...
Alright, so I'd encourage use of option(OutDec)
regardless of locale it sounds like. And I guess in a case like yours where it isn't getting picked up via the locale setting, that's fine (nothing to be done about it) and you could just control it via that parameter?
In that case I could leave the current code as-is and I would just need to add to the function documentation that users can directly control the printing with options(OutDec= ",")
at the .Rprofile, file, or chunk level. How does that sound?
Yes, that is what I would do. And in encouraging the use of option(OutDec)
in package documentation please mention that this setting also guides the behavior of base print, so both are aligned using one parameter - which is kind of neat, if you think of it :)