Non utf8 content in raw email

Question

Non utf8 content in raw email

Pofilo opened this issue 3 years ago · 0 comments

I had an email with non utf-8 compatible character in the subject.

The subject was something like: command n° 193
But the ° is a problem as it appears like: command n� 193

The error was:

Traceback (most recent call last):
  File "/imapbackup38.py", line 283, in scan_folder
    msg_id = MSGID_RE.match(header).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/imapbackup38.py", line 761, in <module>
    main()
  File "/imapbackup38.py", line 671, in main
    fol_messages = scan_folder(
  File "/imapbackup38.py", line 297, in scan_folder
    data_str = str(data[0][1], 'utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 145: invalid start byte

I'm sending a PR to fix this. We use the replace option of str() to replace any non utf8 character.
The result with the PR looks like: command n? 193 and the mail is correctly retrieved and the script continues to download mails.
This is not perfect because the subject is altered, but at least the script doesn't crash and the mail is partly retrieved.