Fails to Parse Body and Attachments if an Extra, Unused Boundary is Present in the Header Lines
Closed this issue · 2 comments
(Reproduction in the next comment)
As I was trying to switch an app off an old version of mailparser (that was being used in the browser) I noticed that a handful of emails I was spot checking in our system were not having any content display or attachments show up (neither did they work in the test harness here https://kreata.ee/postal-mime/example/)
As I was debugging it, I noticed the problematic ones all had an extra "Content-Type" and "boundary" declaration somewhere in the header lines that didn't actually seem to be getting used anywhere else (and was in fact listed earlier than the one that was actually used throughout the rest of the file) and the rest of the ones that worked didn't have that.
Via browser DevTools I eventually narrowed it down to this part of the codebase (
Line 237 in 3546951
I'm unsure if these emails are poorly formatted or if this is an expected state, but assuming the latter would definitely be interested in working on a PR for this if I could be pointed in the right direction
Ok I stripped it down a lot, but I think this example captures what I was seeing.
From: Sender <sender@senderdomain.com>
Date: Tue, 10 May 2022 19:35:09 -0400
Subject: The Subject of the Email
Delivered-To: someone@somewhere.com
Reply-To: sender@senderdomain.com
To: someone@somewhere.com
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_Part_89632_396786613.1652225709294"
Precedence: bulk
Content-Type: multipart/alternative; boundary="=-LoLLfRSB+N6tHPJ/2uLBsA=="
--=-LoLLfRSB+N6tHPJ/2uLBsA==
Content-Type: text/plain; charset=utf-8
Here is the text of the email
--=-LoLLfRSB+N6tHPJ/2uLBsA==
Content-Type: text/html; charset=utf-8
<html><body>Here is the text of the email</body></html>
--=-LoLLfRSB+N6tHPJ/2uLBsA==--
When I run that through https://kreata.ee/postal-mime/example/ I get this, which only includes the subject, from line, to line, and the date
However, once I remove that first Content-Type and boundary that isn't used elsewhere
From: Sender <sender@senderdomain.com>
Date: Tue, 10 May 2022 19:35:09 -0400
Subject: The Subject of the Email
Delivered-To: someone@somewhere.com
Reply-To: sender@senderdomain.com
To: someone@somewhere.com
MIME-Version: 1.0
Precedence: bulk
Content-Type: multipart/alternative; boundary="=-LoLLfRSB+N6tHPJ/2uLBsA=="
--=-LoLLfRSB+N6tHPJ/2uLBsA==
Content-Type: text/plain; charset=utf-8
Here is the text of the email
--=-LoLLfRSB+N6tHPJ/2uLBsA==
Content-Type: text/html; charset=utf-8
<html><body>Here is the text of the email</body></html>
--=-LoLLfRSB+N6tHPJ/2uLBsA==--
and re-run it through the test harness, the text and html of the email show up.
I noticed that nodemailer/mailparser loops over the header lines from top to bottom, and then only keeps around the last Content-Type header it came across - https://github.com/nodemailer/mailparser/blob/b6bba6edd16de30c57566d776b59675c24ed7064/lib/mail-parser.js#L385
However postal-mime loops over the header lines from top to bottom (also keeping the last Content-Type header it came across) -
Line 200 in 3546951
This seems to make the difference for these cases for us, as if I reverse the headerLines via the DevTools Console before they're processed, the issue is fixed.
I'm not sure if this is the best overall fix, but the increased consistency with this would seem to help us.
I will go ahead and spin up a PR to propose this change