rstudio/rmarkdown

Encoding of special characters in the YAML header on Windows

crsh opened this issue · 6 comments

crsh commented

I'm experiencing issues with the encoding of information in the YAML header on Windows. This is the MWE of the .Rmd-file:


---
title: "ÄÜÖäüö߀"
output: html_document

---

```{r}
rmarkdown::metadata$title

sessionInfo()
```

The resulting document contains the following text.

ÖÄÜöäü߀

rmarkdown::metadata$title
## [1] "ÖÄÜöäü߀"

sessionInfo()
## R version 3.2.0 (2015-04-16)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
##
## locale:
## [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
## [5] LC_TIME=German_Germany.1252
##
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base
##
## loaded via a namespace (and not attached):
## [1] tools_3.2.0     htmltools_0.2.6 yaml_2.1.13     rmarkdown_0.5.1
## [5] knitr_1.10      stringr_0.6.2   digest_0.6.8    evaluate_0.7

The title is printed correctly but when I try to access this information in the metadata-list the text is scrambled. I've tried the same thing with PDF and Word ouptut and experience the same issue. It works like a charm on my Linux machine with the following setup:

sessionInfo()
## R version 3.2.0 (2015-04-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.2 LTS
## 
## locale:
##  [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8    
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
## [1] tools_3.2.0     htmltools_0.2.6 yaml_2.1.13     rmarkdown_0.5.1
## [5] knitr_1.10      stringr_0.6.2   digest_0.6.8    evaluate_0.7

Is this a bug in rmarkdown?

@yihui Any idea what might be going on here?

@jjallaire Yes, yaml.load() does not seem to mark the encoding of character strings: it should have marked them as UTF-8. Just looked at the r-yaml repo, and it has been reported long time ago: vubiostat/r-yaml#6 We can certainly get around it by post-processing the character strings by ourselves, but I hope @viking can fix it upstream.

Is there a safe/reliable workaround for us now or should we just wait for
the fix from @viking?

On Tue, Apr 28, 2015 at 12:25 PM, Yihui Xie notifications@github.com
wrote:

@jjallaire https://github.com/jjallaire Yes, yaml.load() does not seem
to mark the encoding of character strings: it should have marked them as
UTF-8. Just looked at the r-yaml repo, and it has been reported long time
ago: vubiostat/r-yaml#6 vubiostat/r-yaml#6 We can
certainly get around it by post-processing the character strings by
ourselves, but I hope @viking https://github.com/viking can fix it
upstream.


Reply to this email directly or view it on GitHub
#420 (comment).

I believe so. Let me try in a few minutes.

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.