TeamMsgExtractor/msg-extractor

it does not generate Header fields(From, To, Subject etc.) properly.

fatihkaymak opened this issue · 3 comments

  • Version of extract_msg: [0.41.2]
  • Your python version: Python [3.10]
  • How did you launch extract_msg?
    • I used the extract_msg package

Describe the bug
It read the msg file and generated MSG object. But header info (From, To, Subject etc.) was not proper. The msg content has Turkish characters; and these are problematic.

[ If applicable ]
**What code did you use or can we use to reproduce this error?

with extract_msg.openMsg('data/sample-mail.msg', overrideEncoding='utf-8') as msg:
    html = msg.htmlBody.decode("utf-8")
    print(msg.htmlInjectableHeader)

Is there a message.msg file you want to share to help us reproduce this?

Traceback

[Put your traceback here]

Screenshots
[Insert any screenshots or debug pictures here]

Additional context
It generated this header:

From: 
=?iso-8859-9?Q?Fatih_Kaymak_=28M=FC=FEteri_ve_Sat=FD=FE_Teknolojileri_B?=
=?iso-8859-9?B?9mz8bfwp?= <Fatih.Kaymak@akbank.com>
Sent: Wed, 07 Jun 2023 15:41:30 +0300
To: =?iso-8859-9?Q?Fatih_Kaymak_=28M=FC=FEteri_ve_Sat=FD=FE_Teknolojileri_B?= =?iso-8859-9?B?9mz8bfwp?= <Fatih.Kaymak@akbank.com>
Subject: extract-msg sampla mail

Looks like the email module isn't parsing the header right for that part. Didn't think that would be an issue, so now I have to figure out how to decode it.

Edit: Ah I see now. Looks like the way the module parses the header doesn't deal with those encoding strings, but email.header.decode_header can be used with some code to give the corrected value. I'll try to get out a fix for this within the next week or so.

This issue is now fixed, the fields should look correct now (and do look correct on the email I tested against)

Thank you.