datawookie/emayili

[FEAT]: Encoding of non-ASCII characters in display names and subjects

2ndmax opened this issue · 9 comments

Description

Hi @datawookie!

I have another suggestion regarding the encoding of display names and e-mail subjects. In German we use 'Umlaute' and other characters, which are not part of US-ASCII. I understand that it is not standardized to use any other characters than ASCII in display names and e-mail subjects, but usually e-mail clients wrap display names and subjects in encoding strings, once those characters are detected.

In my using of emayili I do just the same, I wrap the display names using '=?iso-8859-1?Q?' and '?=' and encode the string using emayili's qp_encode(). Same applies to my handling of the e-mail subject:

emayili::envelope() |> 
  emayili::from(emayili::address(email = "Max.Power@test.com", 
                                 display = paste0("=?iso-8859-1?Q?", 
                                                  emayili::qp_encode(x = "Max ÄÖÜ Power"), 
                                                  "?="))) |> print()
  print()

The result is:

Date: Wed, 23 Mar 2022 16:14:16 GMT
X-Mailer: {emayili}-0.7.5
MIME-Version: 1.0
From: =?iso-8859-1?Q?Max =C3=84=C3=96=C3=9C Power?= Max.Power@test.com

I was wondering if you want to implement a more elegant function than my attempts. Using a GREP expression, characters that are not part of US-ASCII could be detected and in that case the string be encoded and wraped. Maybe that behaviour could be enabled or disabled. And probably both a quoted-printable and a UTF-8 encoding could be implemented...

I do not know how much work that might be - but in my oppinion an encoding option could make emayili a bit more international :-)

Have a nice day!

Hi @2ndmax,

Thanks for bringing this issue to my attention. It's always interesting to learn about issues that pertain to languages with characters outside of the narrow ASCII space used by English!

I've implemented a simple fix in the utf-8 branch. Could you please test and provide me with some feedback? I'm aware that the display of the addresses in R is now not very pretty, but I was focusing my attention on getting it working in the emails. Once I have that sorted I'll worry about the aesthetics.

You can install the branch as follows:

devtools::install_github("datawookie/emayili", ref = "utf-8")

Example:

address(email = "hans@gmail.com", display = "Hansjörg Müller")
[1] "=?UTF-8?B?SGFuc2rDtnJnIE3DvGxsZXI=?= <hans@gmail.com>"

Best regards, Andrew.

Update: Have sorted out the aesthetics too now. Please give me your feedback on this feature today if possible, @2ndmax? Thanks.

Hello @datawookie,

thanks for taking up this task :-)

I just pulled the utf-8 branch from github - but unfortunately it does not work. When I input something like

address(email = "hans@gmail.com", display = "Hansjörg Müller")

or even some Japanese characters like

address(email = "hans@gmail.com", display = "美味しい")

the output is just

[1] "Hansjörg Müller <hans@gmail.com>"
[1] "美味しい <hans@gmail.com>"

without any wrapping.

I looked up the code of format.address() in my package repository and I see your utf-8 changes. So I believe I pulled the correct branch.

Right now I have no idea why this code might not detect or wrap the non-ASCII characters - possibly because I work on a non-English system so these characters are not recognized to be 'special'?

If I can provide you with additional information, please let me know.

Best regards and have a nice day, Maik.

> hans <- address(email = "hans@gmail.com", display = "Hansjörg Müller")
> hans
[1] "Hansjörg Müller <hans@gmail.com>"
> hans %>% as.character()
[1] "Hansjörg Müller <hans@gmail.com>"
> hans %>% as.character(encode = TRUE)
[1] "=?UTF-8?B?SGFuc2rDtnJnIE3DvGxsZXI=?= <hans@gmail.com>"

The extra encode argument is to ensure that the display name is still rendered "normally" in the console but is encoded when you send the message (which is, I believe, the time when it needs to be encoded).

Absolutely right. I just tested sending a message, and that works fine! The display names are utf-8 wrapped if needed.

I do not want to push my luck, but could you do the same for the e-mail subject? That might contain non-ASCII characters as well. After that every part of the e-mail header and body should be able to handle non-ASCII characters.

Bye, Maik.

Ah, you cheeky devil! :) I'll get onto that now.

Okay, I think I have this working. Try something like this:

envelope() %>% subject("Möbelträgerfüße")

You'll need to add To and From addresses.

As with the display names, the encoding will only be applied when the message is sent, so the original (not encoded) subject will be visible when you look at the message in the console.

Let me know your feedback!

First: Props to you for finding a word that contains every Umlaut but has no useful meaning :)

Second: Yes, that works perfectly, thank you for your work!

Do you already have a plan for pushing this to the master branch?

Bye, Maik.

Merged in #125.