Non utf8 content in raw email
Pofilo opened this issue · 0 comments
Pofilo commented
I had an email with non utf-8 compatible character in the subject.
The subject was something like: command n° 193
But the °
is a problem as it appears like: command n� 193
The error was:
Traceback (most recent call last):
File "/imapbackup38.py", line 283, in scan_folder
msg_id = MSGID_RE.match(header).group(1)
AttributeError: 'NoneType' object has no attribute 'group'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/imapbackup38.py", line 761, in <module>
main()
File "/imapbackup38.py", line 671, in main
fol_messages = scan_folder(
File "/imapbackup38.py", line 297, in scan_folder
data_str = str(data[0][1], 'utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 145: invalid start byte
I'm sending a PR to fix this. We use the replace
option of str()
to replace any non utf8 character.
The result with the PR looks like: command n? 193
and the mail is correctly retrieved and the script continues to download mails.
This is not perfect because the subject is altered, but at least the script doesn't crash and the mail is partly retrieved.