koodaamo/tnefparse

unable to extract content from winmail.dat (Mail is a read receipt)

Sam-Gracy opened this issue · 12 comments

Hi @petri ,

I am trying to get the mailbody of a read receipt by parsing the winmail.dat file, but I am not able to get it from tnefparse.

This is the code I am using ...

from tnefparse import TNEF
TNEFObject = TNEF(open(filename, 'rb').read(), do_checksum=True) #filename = "winmail.dat"

mailBody = getattr(TNEFObject, "rtfbody")
if mailBody is None:
        logger.debug("RTF Body not found")
        mailBody = getattr(TNEFObject, "htmlbody")
        if mailBody is None:
            logger.debug("HTML Body not found")
            mailBody = getattr(TNEFObject, "body")
            if mailBody is None:
                logger.debug("Mail Body not found")
            else:
                logger.debug("Found the mail mailBody from TNEF file")
        else:
            logger.debug("Found the HTML mailBody from TNEF file")
    else:
        logger.debug("Found the rich text mailBody from TNEF file")

Attached the winmail.dat file for your reference. Could you please help me to check what is the issue?

Note: I am able to get the body of a delivery report email without any issue using the tnefparse.

Many Thanks,
Suresh.

Hi @petri ,

Could you please help me to have a look into the issue

Many Thanks,
Suresh.

Suresh, are you sure there is a body?

From the bit I read it looks like as if there are possibly only some headers set. And probably each mail client or server is able to do whatever it likes to do.

If this is an important topic for you (and @petri does not know more about it), I'd recommend to read the RFCs.

This looks also interesting:
https://www.limilabs.com/blog/creating-read-receipt-mdn

There, a read receipt gets build - and a text is set manually - which proves me right that the client/server can do whatever they like.

Some more to read:
https://techcommunity.microsoft.com/t5/exchange-team-blog/everything-you-wanted-to-know-about-read-receipts-and-delivery/ba-p/587312

The RFCs
https://tools.ietf.org/html/rfc3798
https://tools.ietf.org/html/rfc8098
https://www.ietf.org/rfc/rfc2298.txt

Hi,

Thanks a lot for sharing the valuable rfc information.

Suresh, are you sure there is a body? --> Yes, when I give the email(eml) file to Outlook, it is displaying the read receipt properly. Since the MIME content type is TNEF, mailbody/attachments are embedded in winmail.dat. That is why I have shared the sample winmail.dat for your reference purpose.

Will share the eml file as well if you would like to look into it once for debug purpose.

From the bit I read it looks like as if there are possibly only some headers set. And probably each mail client or server is able to do whatever it likes to do.

Yes, I agree with that

Thank you once again for helping me.
Suresh.

Yes, when I give the email(eml) file to Outlook, it is displaying the read receipt properly.

Could you provide a screenshot of the read receipt?

Is the text (body) you see there something the other email contact actively wrote? Or is it something auto-generated by Outlook? Maybe from the header information?

Also, yep, it would help if you provide the .eml file too.

I cannot promise anything, but I will certainly have a look.

Hi,

This how it displays when I open the eml using outlook

image

This read receipt is generated by outlook/exchange since I requested a read receipt while composing the email. Please consider the attached winmail.dat and eml file.

Many Thanks,
Suresh

Unfortunately, I have no time to investigate deeper in the next days, but from a quick look at the screenshot I think it is possible that there is no body at all. I assume Outlook is generating this text from the headers. e.g. when I open the same message in Thunderbird, I do not see this text.

Thank you for your analysis, I am also continuously trying to find a way to parse this kind of emails.

petri commented

@Sam-Gracy have you tried dumping the contents of the tnef file to see what it's made of? tnefparse -h will tell you how.

You will see that there are no "attachments" ie. no mail bodies. Take a look at the dump, ie. the various MAPI_* properties that it shows and do some reading on those that seem to you to be related to the read receipt.

@petri
I have gone through those values, but I didn't understand much about those values. i am closing this ticket - Thanks for your help.

Hi @jugmac00,

I can see that this header has some information, how to decode this information

X-MS-Exchange-Forest-IndexAgent-0: AQ0CZW4AAX4DAAAPAAADH4sIAAAAAAAEAIVU227bOBBlElu3rPcXdu
qnBkjth74FbYHsBehDigJNuot9KiiJioTQokBS0fpv/B9984fst+wh
abkq2nQBQRZnzsycOTP0v+mvWypFxXtpL0k27YOhoZGSeNcJrokbqp
SUajDUtMThyDUOQpurRXbdUt8+NqaxovSx1BhYSqFxgIm3JeWyF0DS
/8C6XndS+JS8sM2jeAqoRbnI7pruar+78WwL3pKqqFC9NoJyQcZuJc
BDY2v67fb2kqyie2ERrWwtNEmlHp4tskX29u7dDYUkL+iuFmS5dsBr
a3WT9xZ0puIg2JNC7o7fiyAT6pWN6STfwgyFHKjotRatHZUCsC3VsK
I7RUXNW4TaujGXtFU9bXpjyXSiaKrtkeGBR6X0segKTddivwuu/Y6P
HA/BjcDcECtct6oT7ZRuqYp+A0ZPJ3Ei1hyqq1Y4NV1wmHvT3tMjxx
DdwD8ZIav97gX9HkRZ0XuUMh4+FhlVMHwjDq2vLc/dJjWWBvwUsike
3Bg/5ZK3Dy7f99NwasVwyEFODJ4jqONO3aej3DlgqNIggRCruh/jq9
7NUpXbsflQdJH98Q/f+MX8aMb9eL0MvJc/kvoL/a+34NAGtHzF97ta
i+r1sra2M1fr9TAMq+GlKWqlpFkVarNejrM61nzzp7tH9NfL2wB79m
rN32CseuvU/dtfAlnR/rNb8OvcKOnG+/HDjaFHs6IPQnJ/u5wF6419
IxFaNMRzBQ+Uo964sfMWKzJJsd8950GpQWCeZamFMRcHCUMzk51auV
svVcFluMqI9b8QjY8XCPdguiwid/8RF+7Wj1t9uMfQSk+og4kzq976
6KmCSzd7e3Hl+qdFxtgpO4tZOmfRjM0jFvtjgvcZmyXsHB8py+DCR8
bOYZyx6ISdAjM1Rj4E4VPjGZvjQWZ8B+8Rc+ZhsPs86czBEp8nTn1R
UApHuEJO0DiG4JlUjDyr+WiPfWlY4jFhhgc0gPelnTE50HbeEB5I4g
1L5FqOgyxehzS4gkQBOZJB5mQKgzghswdE0TdJ4PLNRt8mn42yRK6X
oADC46+Zf6kS+vUhMSyzE/aLY5iGLsbqKBel7KcTLwvACfs58Zqf+u
ozP/pk5BmESlySucvp68YTkj7VqV+YKAACn6M4fhbpcXPgCtli1/Uc
53P/MfsPMaMyXlwHAAABC84BPD94bWwgdmVyc2lvbj0iMS4wIiBlbm
NvZGluZz0idXRmLTE2Ij8+DQo8VXJsU2V0Pg0KICA8VmVyc2lvbj4x
NS4wLjAuMDwvVmVyc2lvbj4NCiAgPFVybHM+DQogICAgPFVybCBTdG
FydEluZGV4PSIxMjAyIiBUeXBlPSJVcmwiPg0KICAgICAgPFVybFN0
cmluZz5odHRwczovL3d3dzwvVXJsU3RyaW5nPg0KICAgIDwvVXJsPg
0KICA8L1VybHM+DQo8L1VybFNldD4BDs8BUmV0cmlldmVyT3BlcmF0
b3IsMTAsMDtSZXRyaWV2ZXJPcGVyYXRvciwxMSwyO1Bvc3REb2NQYX
JzZXJPcGVyYXRvciwxMCwwO1Bvc3REb2NQYXJzZXJPcGVyYXRvciwx
MSwwO1Bvc3RXb3JkQnJlYWtlckRpYWdub3N0aWNPcGVyYXRvciwxMC
wwO1Bvc3RXb3JkQnJlYWtlckRpYWdub3N0aWNPcGVyYXRvciwxMSww
O1RyYW5zcG9ydFdyaXRlclByb2R1Y2VyLDIwLDE2
X-MS-Exchange-Forest-IndexAgent: 1 1326

I have gone through several forums to understand what is the encoding type of the above content and how to read it. Could you please have a look

@Sam-Gracy I am sorry, I tried but I think I cannot help you much with your question.

This looks like base64 encoded data, but I cannot figure out which encoding - maybe it is just bytes; I am not familiar with this header.

This is how far I got...

ENCODED = """AQ0CZW4AAX4DAAAPAAADH4sIAAAAAAAEAIVU227bOBBlElu3rPcXdu
qnBkjth74FbYHsBehDigJNuot9KiiJioTQokBS0fpv/B9984fst+wh
abkq2nQBQRZnzsycOTP0v+mvWypFxXtpL0k27YOhoZGSeNcJrokbqp
SUajDUtMThyDUOQpurRXbdUt8+NqaxovSx1BhYSqFxgIm3JeWyF0DS
/8C6XndS+JS8sM2jeAqoRbnI7pruar+78WwL3pKqqFC9NoJyQcZuJc
BDY2v67fb2kqyie2ERrWwtNEmlHp4tskX29u7dDYUkL+iuFmS5dsBr
a3WT9xZ0puIg2JNC7o7fiyAT6pWN6STfwgyFHKjotRatHZUCsC3VsK
I7RUXNW4TaujGXtFU9bXpjyXSiaKrtkeGBR6X0segKTddivwuu/Y6P
HA/BjcDcECtct6oT7ZRuqYp+A0ZPJ3Ei1hyqq1Y4NV1wmHvT3tMjxx
DdwD8ZIav97gX9HkRZ0XuUMh4+FhlVMHwjDq2vLc/dJjWWBvwUsike
3Bg/5ZK3Dy7f99NwasVwyEFODJ4jqONO3aej3DlgqNIggRCruh/jq9
7NUpXbsflQdJH98Q/f+MX8aMb9eL0MvJc/kvoL/a+34NAGtHzF97ta
i+r1sra2M1fr9TAMq+GlKWqlpFkVarNejrM61nzzp7tH9NfL2wB79m
rN32CseuvU/dtfAlnR/rNb8OvcKOnG+/HDjaFHs6IPQnJ/u5wF6419
IxFaNMRzBQ+Uo964sfMWKzJJsd8950GpQWCeZamFMRcHCUMzk51auV
svVcFluMqI9b8QjY8XCPdguiwid/8RF+7Wj1t9uMfQSk+og4kzq976
6KmCSzd7e3Hl+qdFxtgpO4tZOmfRjM0jFvtjgvcZmyXsHB8py+DCR8
bOYZyx6ISdAjM1Rj4E4VPjGZvjQWZ8B+8Rc+ZhsPs86czBEp8nTn1R
UApHuEJO0DiG4JlUjDyr+WiPfWlY4jFhhgc0gPelnTE50HbeEB5I4g
1L5FqOgyxehzS4gkQBOZJB5mQKgzghswdE0TdJ4PLNRt8mn42yRK6X
oADC46+Zf6kS+vUhMSyzE/aLY5iGLsbqKBel7KcTLwvACfs58Zqf+u
ozP/pk5BmESlySucvp68YTkj7VqV+YKAACn6M4fhbpcXPgCtli1/Uc
53P/MfsPMaMyXlwHAAABC84BPD94bWwgdmVyc2lvbj0iMS4wIiBlbm
NvZGluZz0idXRmLTE2Ij8+DQo8VXJsU2V0Pg0KICA8VmVyc2lvbj4x
NS4wLjAuMDwvVmVyc2lvbj4NCiAgPFVybHM+DQogICAgPFVybCBTdG
FydEluZGV4PSIxMjAyIiBUeXBlPSJVcmwiPg0KICAgICAgPFVybFN0
cmluZz5odHRwczovL3d3dzwvVXJsU3RyaW5nPg0KICAgIDwvVXJsPg
0KICA8L1VybHM+DQo8L1VybFNldD4BDs8BUmV0cmlldmVyT3BlcmF0
b3IsMTAsMDtSZXRyaWV2ZXJPcGVyYXRvciwxMSwyO1Bvc3REb2NQYX
JzZXJPcGVyYXRvciwxMCwwO1Bvc3REb2NQYXJzZXJPcGVyYXRvciwx
MSwwO1Bvc3RXb3JkQnJlYWtlckRpYWdub3N0aWNPcGVyYXRvciwxMC
wwO1Bvc3RXb3JkQnJlYWtlckRpYWdub3N0aWNPcGVyYXRvciwxMSww
O1RyYW5zcG9ydFdyaXRlclByb2R1Y2VyLDIwLDE2"""


import base64
import chardet


if __name__ == "__main__":
    a = base64.b64decode(ENCODED)
    print(a)
    print(chardet.detect(a))

And I received this...

Scroll to the end, it looks like there is an XML embedded.

❯ python3.8 main.py 
b'\x01\r\x02en\x00\x01~\x03\x00\x00\x0f\x00\x00\x03\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\x85T\xdbn\xdb8\x10e\x12[\xb7\xac\xf7\x17v\xea\xa7\x06H\xed\x87\xbe\x05m\x81\xec\x05\xe8C\x8a\x02M\xba\x8b}*(\x89\x8a\x84\xd0\xa2@R\xd1\xfao\xfc\x1f}\xf3\x87\xec\xb7\xec!i\xb9*\xdat\x01A\x16g\xce\xcc\x9c93\xf4\xbf\xe9\xaf[*E\xc5{i/I6\xed\x83\xa1\xa1\x91\x92x\xd7\t\xae\x89\x1b\xaa\x94\x94j0\xd4\xb4\xc4\xe1\xc85\x0eB\x9b\xabEv\xddR\xdf>6\xa6\xb1\xa2\xf4\xb1\xd4\x18XJ\xa1q\x80\x89\xb7%\xe5\xb2\x17@\xd2\xff\xc0\xba^wR\xf8\x94\xbc\xb0\xcd\xa3x\n\xa8E\xb9\xc8\xee\x9a\xeej\xbf\xbb\xf1l\x0b\xde\x92\xaa\xa8P\xbd6\x82rA\xc6n%\xc0Cck\xfa\xed\xf6\xf6\x92\xac\xa2{a\x11\xadl-4I\xa5\x1e\x9e-\xb2E\xf6\xf6\xee\xdd\r\x85$/\xe8\xae\x16d\xb9v\xc0kku\x93\xf7\x16t\xa6\xe2 \xd8\x93B\xee\x8e\xdf\x8b \x13\xea\x95\x8d\xe9$\xdf\xc2\x0c\x85\x1c\xa8\xe8\xb5\x16\xad\x1d\x95\x02\xb0-\xd5\xb0\xa2;EE\xcd[\x84\xda\xba1\x97\xb4U=mzc\xc9t\xa2h\xaa\xed\x91\xe1\x81G\xa5\xf4\xb1\xe8\nM\xd7b\xbf\x0b\xae\xfd\x8e\x8f\x1c\x0f\xc1\x8d\xc0\xdc\x10+\\\xb7\xaa\x13\xed\x94n\xa9\x8a~\x03FO\'q"\xd6\x1c\xaa\xabV85]p\x98{\xd3\xde\xd3#\xc7\x10\xdd\xc0?\x19!\xab\xfd\xee\x05\xfd\x1eDY\xd1{\x942\x1e>\x16\x19U0|#\x0e\xad\xaf-\xcf\xdd&5\x96\x06\xfc\x14\xb2)\x1e\xdc\x18?\xe5\x92\xb7\x0f.\xdf\xf7\xd3pj\xc5p\xc8AN\x0c\x9e#\xa8\xe3N\xdd\xa7\xa3\xdc9`\xa8\xd2 \x81\x10\xab\xba\x1f\xe3\xab\xde\xcdR\x95\xdb\xb1\xf9Pt\x91\xfd\xf1\x0f\xdf\xf8\xc5\xfch\xc6\xfdx\xbd\x0c\xbc\x97?\x92\xfa\x0b\xfd\xaf\xb7\xe0\xd0\x06\xb4|\xc5\xf7\xbbZ\x8b\xea\xf5\xb2\xb6\xb63W\xeb\xf50\x0c\xab\xe1\xa5)j\xa5\xa4Y\x15j\xb3^\x8e\xb3:\xd6|\xf3\xa7\xbbG\xf4\xd7\xcb\xdb\x00{\xf6j\xcd\xdf`\xacz\xeb\xd4\xfd\xdb_\x02Y\xd1\xfe\xb3[\xf0\xeb\xdc(\xe9\xc6\xfb\xf1\xc3\x8d\xa1G\xb3\xa2\x0fBr\x7f\xbb\x9c\x05\xeb\x8d}#\x11Z4\xc4s\x05\x0f\x94\xa3\xde\xb8\xb1\xf3\x16+2I\xb1\xdf=\xe7A\xa9A`\x9ee\xa9\x851\x17\x07\tC3\x93\x9dZ\xb9[/U\xc1e\xb8\xca\x88\xf5\xbf\x10\x8d\x8f\x17\x08\xf7`\xba,"w\xff\x11\x17\xee\xd6\x8f[}\xb8\xc7\xd0JO\xa8\x83\x893\xab\xde\xfa\xe8\xa9\x82K7{{q\xe5\xfa\xa7E\xc6\xd8);\x8bY:g\xd1\x8c\xcd#\x16\xfbc\x82\xf7\x19\x9b%\xec\x1c\x1f)\xcb\xe0\xc2G\xc6\xcea\x9c\xb1\xe8\x84\x9d\x0235F>\x04\xe1S\xe3\x19\x9b\xe3Af|\x07\xef\x11s\xe6a\xb0\xfb<\xe9\xcc\xc1\x12\x9f\'N}QP\nG\xb8BN\xd08\x86\xe0\x99T\x8c<\xab\xf9h\x8f}iX\xe21a\x86\x074\x80\xf7\xa5\x9d19\xd0v\xde\x10\x1eH\xe2\rK\xe4Z\x8e\x83,^\x874\xb8\x82D\x019\x92A\xe6d\n\x838!\xb3\x07D\xd17I\xe0\xf2\xcdF\xdf&\x9f\x8d\xb2D\xae\x97\xa0\x00\xc2\xe3\xaf\x99\x7f\xa9\x12\xfa\xf5!1,\xb3\x13\xf6\x8bc\x98\x86.\xc6\xea(\x17\xa5\xec\xa7\x13/\x0b\xc0\t\xfb9\xf1\x9a\x9f\xfa\xea3?\xfad\xe4\x19\x84J\\\x92\xb9\xcb\xe9\xeb\xc6\x13\x92>\xd5\xa9_\x98(\x00\x02\x9f\xa38~\x16\xe9qs\xe0\n\xd9b\xd7\xf5\x1c\xe7s\xff1\xfb\x0f1\xa32^\\\x07\x00\x00\x01\x0b\xce\x01<?xml version="1.0" encoding="utf-16"?>\r\n<UrlSet>\r\n  <Version>15.0.0.0</Version>\r\n  <Urls>\r\n    <Url StartIndex="1202" Type="Url">\r\n      <UrlString>https://www</UrlString>\r\n    </Url>\r\n  </Urls>\r\n</UrlSet>\x01\x0e\xcf\x01RetrieverOperator,10,0;RetrieverOperator,11,2;PostDocParserOperator,10,0;PostDocParserOperator,11,0;PostWordBreakerDiagnosticOperator,10,0;PostWordBreakerDiagnosticOperator,11,0;TransportWriterProducer,20,16'
{'encoding': 'Windows-1254', 'confidence': 0.22495847050694842, 'language': 'Turkish'}

Thank you for your help and sorry to bother you :)