Windows code page detection
Opened this issue · 2 comments
Thanks for developing radian.
I am running R v. 4.3.0 on, Windows 11. When using R term as interactive terminal in vscode, I am getting:
Sys.getlocale()
[1]"LC_COLLATE=French_France.utf8;LC_CTYPE=French_France.utf8;LC_MONETARY=French_France.utf8;LC_NUMERIC=C;LC_TIME=French_France.utf8"
l10n_info()$system.codepage
[1] 65001
l10n_info()$codepage
[1] 65001
Now when using radian:
sessionInfo()$locale
"LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252"
l10n_info()$system.codepage
[1] 1252
l10n_info()$codepage
[1] 1252
After tweaking my .Rprofile, I can force R to use UTF-8 with radian:
sessionInfo()$locale
[1] "LC_COLLATE=fr_FR.UTF-8;LC_CTYPE=fr_FR.UTF-8;LC_MONETARY=fr-FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr-FR.UTF-8"
However, the R code page now conflicts with the Windows code page:
l10n_info()$system.codepage
[1] 1252
l10n_info()$codepage
[1] 65001
Starting from Windows 10 version 1803 and R v4.2, l10n_info()$system.codepage should report 65001.
The R-help page for ?Sys.setlocale says:
"From R 4.2, UCRT locale names should be used. The character set should match the system/ANSI codepage (l10n_info()$codepage be the same as l10n_info()$system.codepage). Setting it to any other value results in a warning and may cause encoding problems. As from R 4.2 on recent Windows the system codepage is 65001 and one should always use locale names ending with ".UTF-8" (except for "C" and ""), otherwise Windows may add a different character set."
It is unfortunately due to lack of naive UTF-8 support for python (radian requires python in case you didn't know).
It seems that there is a way to change python manifest's activeCodePage to UTF-8 via mt.exe
python/cpython#86873 (comment)
Thanks for your answer.