Unicode not working in inf_mr()
eternal-flame-AD opened this issue · 3 comments
---
title: "Xaringan inf_mr"
output: xaringan::moon_reader
---
無限 `r system2("python", c("-c", shQuote('print("月読")')), stdout = TRUE)`
If I render through rmarkdown::render
I get the expected "無限 月読" but if I try to use inf_mr
I just get this message and a blank output:
Warning message:
In grep("<!-- DISABLE-SERVR-WEBSOCKET -->", body, fixed = TRUE) :
input string 1 is invalid in this locale
It seems like this is coming from here: 69f1279
I tried to adjust the locale settings, if I do Sys.setlocale("LC_ALL", "Ja_JP.UTF-8")
it fixes the above issue but now it doesn't decode the stdout correctly, I get: 無限 <8c><8e><93>
Some more locale gymnastics within the document probably could fix that but I think dynamic_site
shouldn't assume the body is in the system locale.
My OS locale is English display and shift-JIS codepage.
[ins] r$> sessionInfo()
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.932 LC_CTYPE=English_United States.932 LC_MONETARY=English_United States.932 LC_NUMERIC=C LC_TIME=English_United States.932
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xaringan_0.28.1
loaded via a namespace (and not attached):
[1] compiler_4.2.3 fastmap_1.1.1 cli_3.6.0 htmltools_0.5.4 xfun_0.37 digest_0.6.31 rlang_1.1.0
Sys.setlocale("LC_ALL", "Ja_JP.UTF-8")
may not be enough, since it is only for changing the locale for R, but not for your operating system. Have you tried to set the locale to UTF-8 system-wide? (I don't use Windows but I assume you can do it in the control panel) Ideally when you restart your system and R, sessionInfo()
should show the UTF-8 locale.
For useBytes = TRUE
, I was following an R core member's suggestion: https://blog.r-project.org/2022/10/10/improvements-in-handling-bytes-encoding/index.html
I can certainly revert 69f1279 if necessary. Thanks!
I read the article you mentioned and I think I know where the discrepancy was coming from, the reason is because this line:
Line 184 in 673d979
This assumes body is in system locale but HTML is should be automatically UTF-8 as declared in the meta tag. I think we should change it to something like this:
if (is.raw(body)) {
body = rawToChar(body)
Encoding(body) = "UTF-8"
}
I tested this and it fixes the issue.
Great! That is also what I guessed (I should have declared the encoding explicitly). I'll commit the fix in a minute. Thanks!