Encoding of special characters in the YAML header on Windows
crsh opened this issue · 6 comments
I'm experiencing issues with the encoding of information in the YAML header on Windows. This is the MWE of the .Rmd-file:
---
title: "ÄÜÖäüö߀"
output: html_document
---
```{r}
rmarkdown::metadata$title
sessionInfo()
```
The resulting document contains the following text.
ÖÄÜöäü߀
rmarkdown::metadata$title
## [1] "ÖÄÜöäü߀"
sessionInfo()
## R version 3.2.0 (2015-04-16)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
##
## locale:
## [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
## [5] LC_TIME=German_Germany.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] tools_3.2.0 htmltools_0.2.6 yaml_2.1.13 rmarkdown_0.5.1
## [5] knitr_1.10 stringr_0.6.2 digest_0.6.8 evaluate_0.7
The title is printed correctly but when I try to access this information in the metadata-list the text is scrambled. I've tried the same thing with PDF and Word ouptut and experience the same issue. It works like a charm on my Linux machine with the following setup:
sessionInfo()
## R version 3.2.0 (2015-04-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.2 LTS
##
## locale:
## [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
## [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
## [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] tools_3.2.0 htmltools_0.2.6 yaml_2.1.13 rmarkdown_0.5.1
## [5] knitr_1.10 stringr_0.6.2 digest_0.6.8 evaluate_0.7
Is this a bug in rmarkdown?
@jjallaire Yes, yaml.load()
does not seem to mark the encoding of character strings: it should have marked them as UTF-8. Just looked at the r-yaml repo, and it has been reported long time ago: vubiostat/r-yaml#6 We can certainly get around it by post-processing the character strings by ourselves, but I hope @viking can fix it upstream.
Is there a safe/reliable workaround for us now or should we just wait for
the fix from @viking?
On Tue, Apr 28, 2015 at 12:25 PM, Yihui Xie notifications@github.com
wrote:
@jjallaire https://github.com/jjallaire Yes, yaml.load() does not seem
to mark the encoding of character strings: it should have marked them as
UTF-8. Just looked at the r-yaml repo, and it has been reported long time
ago: vubiostat/r-yaml#6 vubiostat/r-yaml#6 We can
certainly get around it by post-processing the character strings by
ourselves, but I hope @viking https://github.com/viking can fix it
upstream.—
Reply to this email directly or view it on GitHub
#420 (comment).
I believe so. Let me try in a few minutes.
FWIW here's the workaround we use in RStudio:
https://github.com/rstudio/rstudio/blob/master/src/cpp/session/modules/SessionRMarkdown.R#L104-L109
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.