baranovskypd/goodpress

Encoding

Closed this issue · 5 comments

Hi!
Thanks a lot for your work, I just have a problem with encoding when publishing, where all the ' become ’ for instance. And as a french guy, all my accents become nonsense. I suspect that it is a charset/encoding problem but I don't know how to resolve it. I tried to put encoding: UTF-8 in the YAML but it didn't change. I checked that my Wordpress DB is in UTF-8 to.
The text is bugged when arriving on wordpress, as when editing the code, the ’ are hardcoded.
Do you have an idea?

Hello! Sorry I don't have time to dive into this at the moment but it sure sounds like a problem (je m'appelle Maëlle, je comprends...). If you do find a solution and make a PR I'd try to merge it rapidly!

httr seems to have an encode argument, not sure if relevant https://httr.r-lib.org/reference/POST.html

Code for HTTP stuff in goodpress https://github.com/maelle/goodpress/blob/main/R/utils-http.R

Or maybe the encoding problem comes sooner

Sorry for not being able to help more!

Also, your operating system might be a factor (e.g. if it is Windows, I used goodpress on Ubuntu anc well with English text 😬 ).

A last comment in case you are new to debugging packages https://github.com/jennybc/debugging

Salut Maëlle !
I just tried with WSL2-Ubuntu and I do not have the problem, so Windows 10 is definitely guilty. I will try to find why.

Ok, I found the problem, it is actually in the use of readLines() to send the html to glue::glue_collapse() in wp_post(). To resolve the problem, change the parameters from:
function (post_folder, wordpress_url)
To:
function (post_folder, wordpress_url, encoding = "UTF-8")
And the incriminated line from:
body <- glue::glue_collapse(readLines(html_path), sep = "\n")
To
body <- glue::glue_collapse(readLines(html_path, encoding = encoding), sep = "\n").
I have "la flemme" to pull all your git…