postalsys/postal-mime

Fails to Parse Body and Attachments if an Extra, Unused Boundary is Present in the Header Lines

Closed this issue · 2 comments

(Reproduction in the next comment)

As I was trying to switch an app off an old version of mailparser (that was being used in the browser) I noticed that a handful of emails I was spot checking in our system were not having any content display or attachments show up (neither did they work in the test harness here https://kreata.ee/postal-mime/example/)

As I was debugging it, I noticed the problematic ones all had an extra "Content-Type" and "boundary" declaration somewhere in the header lines that didn't actually seem to be getting used anywhere else (and was in fact listed earlier than the one that was actually used throughout the rest of the file) and the rest of the ones that worked didn't have that.

Via browser DevTools I eventually narrowed it down to this part of the codebase (

this.postalMime.boundaries.push({
) where it seemed like what was happening was the extra boundary was getting registered in the loop above as the one Content-Type and boundary to start with (the loop seemed to be going from bottom to top of the headers) but then since it didn't match anything, later on downstream none of the body processing seem to be working (in and around postal-mime.js::processLine)

I'm unsure if these emails are poorly formatted or if this is an expected state, but assuming the latter would definitely be interested in working on a PR for this if I could be pointed in the right direction

Ok I stripped it down a lot, but I think this example captures what I was seeing.

From: Sender <sender@senderdomain.com>
Date: Tue, 10 May 2022 19:35:09 -0400
Subject: The Subject of the Email
Delivered-To: someone@somewhere.com
Reply-To: sender@senderdomain.com
To: someone@somewhere.com
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_89632_396786613.1652225709294"
Precedence: bulk
Content-Type: multipart/alternative; boundary="=-LoLLfRSB+N6tHPJ/2uLBsA=="

--=-LoLLfRSB+N6tHPJ/2uLBsA==
Content-Type: text/plain; charset=utf-8

Here is the text of the email

--=-LoLLfRSB+N6tHPJ/2uLBsA==
Content-Type: text/html; charset=utf-8

<html><body>Here is the text of the email</body></html>

--=-LoLLfRSB+N6tHPJ/2uLBsA==--

When I run that through https://kreata.ee/postal-mime/example/ I get this, which only includes the subject, from line, to line, and the date

image

However, once I remove that first Content-Type and boundary that isn't used elsewhere

From: Sender <sender@senderdomain.com>
Date: Tue, 10 May 2022 19:35:09 -0400
Subject: The Subject of the Email
Delivered-To: someone@somewhere.com
Reply-To: sender@senderdomain.com
To: someone@somewhere.com
MIME-Version: 1.0
Precedence: bulk
Content-Type: multipart/alternative; boundary="=-LoLLfRSB+N6tHPJ/2uLBsA=="

--=-LoLLfRSB+N6tHPJ/2uLBsA==
Content-Type: text/plain; charset=utf-8

Here is the text of the email

--=-LoLLfRSB+N6tHPJ/2uLBsA==
Content-Type: text/html; charset=utf-8

<html><body>Here is the text of the email</body></html>

--=-LoLLfRSB+N6tHPJ/2uLBsA==--

and re-run it through the test harness, the text and html of the email show up.

image

I noticed that nodemailer/mailparser loops over the header lines from top to bottom, and then only keeps around the last Content-Type header it came across - https://github.com/nodemailer/mailparser/blob/b6bba6edd16de30c57566d776b59675c24ed7064/lib/mail-parser.js#L385

However postal-mime loops over the header lines from top to bottom (also keeping the last Content-Type header it came across) -

for (let i = this.headerLines.length - 1; i >= 0; i--) {

This seems to make the difference for these cases for us, as if I reverse the headerLines via the DevTools Console before they're processed, the issue is fixed.

I'm not sure if this is the best overall fix, but the increased consistency with this would seem to help us.

I will go ahead and spin up a PR to propose this change