OfficeDev/ews-managed-api

Transport headers are sometimes returned with missing whitespace character(s)

andrewlamansky opened this issue · 0 comments

Background

The Item class has an InternetMessageHeaders property to get an email's header collection. However, the documentation for this property notes that it won't return all of header collection; to access the transport headers, the PR_TRANSPORT_MESSAGE_HEADERS extended property must be used via the following implementation:

ExtendedPropertyDefinition transportMsgHdr = new ExtendedPropertyDefinition(0x007D, MapiPropertyType.String);

The documentation also links to an Exchange 2010 technical article which appears to confirm that transport headers can only be accessed as an extended property:

To get the message headers, the EWS schema exposes the InternetMessageHeaders first-class property. Unfortunately, this property does not appear to return a complete set of message headers. For example, in Exchange 2010 Service Pack 1 (SP1), the InternetMessageHeaders property doesn't return address headers such as From and To. I suggest that you use the PR_TRANSPORT_MESSAGE_HEADERS property, as specified in [MS-OXPROPS]: Exchange Server Protocols Master Property List section 2.1103.

The Bug

The problem is that when headers are accessed as an extended property, the formatting of some multiline headers is noncompliant with the RFC standard for long header fields. To quote the RFC:

Each header field can be viewed as a single, logical line of ASCII characters, comprising a field-name and a field-body. For convenience, the field-body portion of this conceptual entity can be split into a multiple-line representation; this is called "folding". The general rule is that wherever there may be linear-white-space (NOT simply LWSP-chars), a CRLF immediately followed by AT LEAST one LWSP-char may instead be inserted.

As an example, here is a demonstrative multi-line "x-header"

X-Folded-Header: this is a valid multiline header
	because each new line is immediately followed
	by at least
	one whitespace character

Note that after each CRLF, there is a whitespace character (tab) to indicate that the subsequent line is still part of X-Folded-Header.

Now, here is the same header as returned by the extended property lookup in EWS:

X-Folded-Header: this is a valid multiline header\r\nbecause each new line is immediately followed\r\nby at least\r\none whitespace character\r\n

The CRLF is represented by character sequence \r\n, but there is no required whitespace character (e.g. \t) to indicate each new line is part of the same original header. Thus, a header parser will attempt to read each \r\n delineated string as a new header, with unpredictable results.

X-Folded-Header is a fictitious example, but the same missing whitespace bug has been observed in DKIM-Signature and Received headers returned from the extended property lookup in a production context. There does not appears to be a work-around available, since neither the Item.InternetMessageHeaders property nor the Item.MimeContent property will expose the message's transport headers.

Steps to Reproduce

  1. Copy all lines of X-Folded-Header from the previous section, and paste as a header in any .eml email file
  2. Use EWS to get the X-Folded-Header header from the .eml file via the PR_TRANSPORT_MESSAGE_HEADERS property as advised in the documentation.