spatie/phpunit-snapshot-assertions

Incorrect HTML output

Closed this issue · 10 comments

We have many emails generated by MJML that use conditional comments. There is a particular conditional comment to Prevent IE Edge Meta Tag from Breaking Images in Live Mail that looks like this...

<!--[if !mso]><!-->
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
<!--<![endif]-->

When this is run though assertMatchesHtmlSnapshot() it gets changed, incorrectly, to...

<!--[if !mso]><!--><meta http-equiv="X-UA-Compatible" content="IE=edge">\n
<!--<![endif]-->

Adding LIBXML_HTML_NODEFDTD as the second parameter to loadHtml() here seems to fix the "issue"

I'm not sure why, hence why I have created an issue rather than a PR.

Here's a failing and passing example ready to run in Tinkerwell.

Failing example

$data = <<<EOT
<!doctype html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office">

<head>
    <title></title>
    <!--[if !mso]><!-->
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <!--<![endif]-->
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
    <p>Hello</p>
</body>

</html>
EOT;


$domDocument = new DOMDocument('1.0');
$domDocument->preserveWhiteSpace = false;
$domDocument->formatOutput = true;

@$domDocument->loadHTML($data); // to ignore HTML5 errors

$domDocument->saveHTML();

// OUTPUT:

// <!DOCTYPE html>\n
// <html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office">\n
// <head>\n
// <title></title>\n
// <!--[if !mso]><!--><meta http-equiv="X-UA-Compatible" content="IE=edge">\n
// <!--<![endif]--><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n
// </head>\n
// <body>\n
//     <p>Hello</p>\n
// </body>\n
// </html>\n

Passing example

$data = <<<EOT
<!doctype html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office">

<head>
    <title></title>
    <!--[if !mso]><!-->
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <!--<![endif]-->
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
    <p>Hello</p>
</body>

</html>
EOT;


$domDocument = new DOMDocument('1.0');
$domDocument->preserveWhiteSpace = false;
$domDocument->formatOutput = true;

@$domDocument->loadHTML($data, LIBXML_HTML_NODEFDTD); // to ignore HTML5 errors

$domDocument->saveHTML();

// OUTPUT:

// <!DOCTYPE html>\n
// <html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office">\n
// \n
// <head>\n
//     <title></title>\n
//     <!--[if !mso]><!-->\n
//     <meta http-equiv="X-UA-Compatible" content="IE=edge">\n
//     <!--<![endif]-->\n
//     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n
// </head>\n
// <body>\n
//     <p>Hello</p>\n
// </body>\n
// \n
// </html>\n

Could you PR the fix with LIBXML_HTML_NODEFDTD?

Thanks!

@freekmurze Im not sure if this needs further investigation. See this table of results on varying systems

PHP LibXML Correct output? Comments
8.1.6 2.9.4 M1 Mac Studio. macOS Moneteray 12.4. PHP installed via brew
8.1.0 2.9.4 Intel iMac. macOS BigSur 11.6.5. PHP installed via MAMP
8.1.5 2.9.13 GitHub action, PHP installed using shivammathur/setup-php@v2
8.1.6 2.9.1 Online PHP runner

I thought at first it may have been the LibXML version that may be different, but I think my testing proves otherwise.

We have 2 new M1 Mac Studios and we get the correct result on both of those, but on all other systems we test on, we get the "incorrect" output. Im starting to question which is the correct/incorrect output now 😕

Adding LIBXML_HTML_NODEFDTD makes all the above pass the correct output test.

Do you have any thoughts on this?

Thanks for having investigated this. If have no thoughts on this besides that for our projects the current code seems to just work on the systems we use. Feel free to PR that LIBXML_HTML_NODEFDTD if that fixes the problems on all systems.

Looks like the changes related to this issue and #141 PR have caused failing tests in my project.

Was able to track this down by observing that tests passed on versions when I used composer update --prefer-lowest --prefer-stable. However if I pinned this package to ^4.2.12 they began failing again. In that way I was able to rule out any of my projects regular package dependencies as the cause of the failing tests. For now as a workaround I've tagged to 4.2.11 in my package.

Here's the lines causing the failure:

--- Expected
+++ Actual
@@ @@
-'<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">\n
-<html><body>\n
+'<html><body>\n

Frankly I can probably just update the snapshots and move about with life. But I wanted to put a comment here as if it affects other people and other projects then this change could be something worth considering as a bug. However I'd leave that to @freekmurze to make the call on if this change in behaviour should be designated a bug. Freek if you do want to treat it as a bug, then instead of updating snapshots I will file a new issue and investigate.

@mallardduck what did your raw source input look like?

@JayBizzle The diff of the error shows the code changes.

The code I've input to the assertion does not have doctype. However I believe that prior to 4.1.12 the library added the doctype.

While one could argue that this change makes the assertion more accurate to the input HTML being compared, it's still caused a breaking change in this case. It just seems that unfortunately this wasn't a case covered under the test suite when #141 was merged in so it went unnoticed.

Yes, I agree with the breaking change statement, but there is no way that I believe the HTML snapshots should change the source input at all.

Even with the change I made in the PR, there are still minor things that get changed once your source HTML has been run through DOMDocument in the HTML assertion.

You're basically asserting against things that don't actually appear in production code. Might seem minor, but when your dealing with HTML for emails, and knowing how flakey email clients can be rendering them, I want my assertion to match the raw source entirely.

We are considering switching to the plain text assertions to get around this problem, just a shame you can't specify the file extension of the snapshot, so we could specify .html

Honestly I would agree mostly about comparing the raw source being best - but if there's a sane "wrapping and unwrapping" applied uniformly then it's acceptable to not compare raw HTML input. For me the "pain" was just the breaking change without a major version.

The real question is where to go from here? One obvious solution would be to just revert the tag and retag it as a major release. However if the goal is to make HTML assertions even more accurate 1:1 comparisons - which may or may not introduce more breaks - maybe a revert is more apt so all those changes can be included in a major?

Yes, I was actually surprised the PR got merged so quickly, especially as I had it marked as a draft. A major version release is what I would have suggested had it been discussed further.

I'm surprised there wasn't more feedback regarding the breaking change after that PR was released.

I guess we need to get some feedback from @freekmurze as to where to go from here.

I treated it as a bug fix release, as you could consider that extra HTML that was added in earlier versions was a bug. I very much liked the new shorter output, and wanted to have that available straight away.

In hindsight, I agree that it should have been released as a major version as this indeed breaks tests (though easily fixed by regenerating snapshots).

In the future when such improvements are made, I'll be sure to tag a major.

Thanks again for your work on this @JayBizzle